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Abstract 

Knowledge  of  the  asymptotic  variance  of  an  estimator  is  important  for 
large  sample  inference,  efficiency,  and  as  a  guide  to  the  specification  of 
regularity  conditions.   The  purpose  of  this  paper  is  the  presentation  of  a 
general  formula  for  the  asymptotic  variance  of  a  semiparametric  estimator.   A 
particularly  important  feature  of  this  formula  is  a  way  of  accounting  for  the 
presence  of  nonparametric  estimates  of  nuisance  functions.   The  general  form 
of  an  adjustment  factor  for  nonparametric  estimates  is  derived  and  analyzed. 

The  usefulness  of  the  formula  is  illustrated  by  deriving  propositions  on 
asymptotic  equivalence  for  different  nonparametric  estimators  of  the  same 
function,  conditions  for  estimation  of  the  nuisance  functions  to  have  no 
effect  on  the  asymptotic  variance,  and  the  form  of  a  correction  term  for  the 
presence  of  linear  function  of  a  conditional  expectation  estimator,  or  other 
projection  estimator  (e.g.  partially  linear  and/or  additive  nonparametric 
projections),  and  for  a  function  of  a  density.   Specific  results  cover  a 
semiparametric  random  effects  model  for  binary  panel  data,  nonparametric 
consumer  surplus,  nonparametric  prediction,  and  average  derivatives. 
Regularity  conditions  are  given  for  many  of  the  propositions.   These  include 
primitive  conditions  for  v'n-consistency,  asymptotic  normality,  and  consistency 
of  an  asymptotic  variance  estimator  with  series  estimators  of  conditional 
expectations  (or  projections),  in  each  of  the  examples. 


Keywords:   semiparametric  estimation,  asymptotic  variance,  nonparametric 
regression,  series  estimation,  panel  data,  consumer  surplus,  average 
derivative. 


1.    Introduction 

This  paper  develops  a  general  form  for  the  asymptotic  variance  of 
semiparametric  estimators.   Despite  the  complicated  nature  of  such  estimators, 
which  can  depend  on  estimators  of  functions,  the  formula  is  straightforward  to 
derive  in  many  cases,  requiring  only  some  calculus.   Although  the  formula  is 
not  based  on  primitive  conditions,  it  should  be  useful  for  semiparametric 
models,  just  as  analogous  formulae  are  for  parametric  models,  such  as  Huber 
(1967)  for  m-estimators.   It  gives  the  form  of  remainder  terms,  which 
facilitates  specification  of  primitive  conditions.   It  also  can  be  used  to 
make  asymptotic  efficiency  comparisons,  in  order  to  find  an  efficient 
estimator  in  some  class. 

The  usefulness  of  this  formula  is  illustrated  in  several  ways.   New 
examples  are  considered  throughout,  in  order  to  emphasize  that  it  can  be  a 
useful  tool  for  further  work  in  semiparametric  estimation,  and  not  just  a  way 
of  "unifying"  existing  results.   A  number  of  Propositions  are  derived,  and 
primitive  conditions  are  given  for  many  of  them.   The  propositions  include 
showing  that  the  method  of  estimating  a  function  (e.g.  kernel  or  polynomial 
regression)  does  not  affect  the  asymptotic  variance  of  the  estimator.   Also, 
two  sufficient  conditions  are  given  for  the  absence  of  an  effect  on  the 
asymptotic  variance  from  the  presence  of  a  function  estimator.   One  is  that 
the  limit  of  the  function  estimator  maximizes  the  same  expected  objective 
function  as  the  population  parameter,  i.e.  the  function  has  been  "concentrated 
out."   The  other  is  a  certain  orthogonality  condition. 

Several  propositions  are  given  on  the  form  of  correction  terms  for  the 
presence  of  function  estimates.   One  has  sufficient  conditions  for  this 
adjustment  to  take  the  form  of  the  projection  on  the  tangent  set  (the 
mean-square  closure  of  all  scores  for  parametric  models  of  the  nuisance 


functions)  for  a  semiparametric  model.  More  specific  results  are  given  for 
the  case  of  conditional  expectations,  or  other  mean  square  projections,  and 
for  densities.  A  characterization  of  the  correction  term  for  estimators  of 
linear  functions  of  projections  and  densities  is  given,  with  specific  formula 
given  for  semiparametric  individual  effects  regression  for  binary  panel  data, 
nonparametric  consumer  surplus,  and  Stock's  (1989)  nonparametric  prediction 
estimator. 

Regularity  conditions  for  v^-consistency  and  asymptotic  normality  are 
formulated.   The  discussion  is  organized  around  a  few  "high-level" 
assumptions.   Times  series  are  covered,  including  weighted  autocovariance 
estimation  of  the  asymptotic  variance,  with  data-based  lag  choice.   Primitive 
conditions  are  given  for  power  series  estimators  of  conditional  expectations 
and  other  projections,  including  several  examples. 

The  formula  builds  on  previous  work,  including  that  on  Von  Mises  (1947) 
estimators,  i.e.  functionals  of  the  empirical  distribution,  by  Reeds  (1976), 
Boos  and  Serf ling  (1980),  and  Fernholz  (1983).   The  formula  here  allows  for 
explicit  dependence  on  nonparametric  functions  estimators,  such  as  conditional 
expectations  or  densities,  which  are  difficult  to  allow  for  in  the  Gateaux 
derivative  formula  for  Von-Mises  estimators.   It  is  based  on  calculating  the 
semiparametric  efficiency  bound,  as  in  Koshevnik  and  Levit  (1976),  Pfanzagl 
and  Wefelmeyer  (1982),  and  Van  der  Vaart  (1991)  for  the  functional  the 
estimator  is  a  nonparametric  estimator  of,  as  discussed  in  the  next  section. 
Also,  some  of  the  examples  build  on  previous  work  on  semiparametric 
estimation,  including  Bickel,  Klaassen,  Ritov,  and  Wellner  (1990),  Hardle  and 
Stoker  (1989),  Klein  and  Spady  (1987),  Powell,  Stock,  and  Stoker  (1989), 
Robinson  (1988),  Stock  (1989),  and  others  cited  below. 

Section  2  gives  the  formula  for  the  asymptotic  variance.   Section  3  and  4 
apply  this  formula  to  derive  some  propositions  on  the  effect  of  preliminary 


nonparametric  estimators  on  the  asymptotic  variance.   Some  high-level 
regularity  conditions  are  collected  in  Section  5.   Section  6  gives  general 
regularity  conditions  for  v^-consistency  and  asymptotic  normality  when  an 
estimator  depends  on  a  series  estimator  of  a  conditional  expectation  or  other 
projection,  and  Section  7  applies  these  results  to  specify  primitive 
regularity  conditions  for  several  examples. 


2.    The  Pathwise  Derivative  Formula  for  the  Asymptotic  Variance 

The  formula  is  based  on  the  observation  that  Vn-consistent  nonparametric 
estimators  are  often  efficient.   For  example,  the  sample  mean  is  known  to  be 
an  efficient  estimator  of  the  population  mean  in  a  nonparametric  model  where 
no  restrictions,  other  than  regularity  conditions  (e.g.  existence  of  the 
second  moment)  are  placed  on  the  distribution  of  the  data.   The  idea  here  is 
to  use  this  observation  to  calculate  the  asymptotic  variance  of  a  semiparamet- 
ric  estimator,  by  finding  the  functional    that  it  nonparametrically  estimates, 
i.e.  the  object  that  it  converges  to  under  general  misspecif ication,  and 
calculating  the  semiparametric  variance  bound  for  this  functional. 

To  be  more  precise,  let  ^  be  an  estimator,  and  suppose  that  one  can 
associate  with  it  the  triple, 


(2.1)  p 


z  ;   finite  dimensional  data  vector, 

^  =   {F   };      unrestricted  family  of  distributions  of  z, 

fi  :  3^  ^  R^;   u(F  )  =  plim(p)   when  F   is  true. 
z  z 


That  is,  p     is  a  nonparametric  estimator  of  ^l{F   ),      having  this  as  its 
probability  limit  for  all  distributions  of  z  belonging  to  a  family  that  is 
unrestricted,  except  for  regularity  conditions.   In  other  words,  M^F  )  is  the 


object  estimated  by  p     under  general  misspecif ication,  when  the  distribution 

of  z  does  not  necessarily  satisfy  restrictions  on  which  p  is  based.   The 

asymptotic  variance  formula  discussed  here  is  taken  to  be  the  variance  bound 

bound  for  estimation  of  ii{F  ),   F  €  9^.   This  formula  is  an  alternative  to 

z     z 

the  Gateaux  derivative  for  Von-Mises  estimators,  because  the  domain  of  li(F  ) 

z 

need  not  include  all  distributions,  e.g.  so  that  fi(F  )   can  depend  explicitly 
on  a  density  function.   In  the  technical  conditions  to  follow,  this  feature  of 
the  formula  results  from  F   having  a  density  with  respect  to  a  measure  for 
which  the  true  distribution  also  has  a  density. 

The  formula  for  calculating  the  variance  bound  for  /i(F  )   is  that  given 
in  previous  work  by  Koshevnik  and  Levit  (1976),  Pfanzagl  and  Wefelmeyer 
(1982),  and  others.   Following  Van  der  Vaart  (1991),  let   <F  ^  :  6  e  (0,  e)  c 

Zd 

R,   e  >  0,  F   e  3-},      denote  a  one-dimensional  subfamily  of  ^,    i.e.  a  path  in 
Z3 

3-,      such  that  the  true  distribution  and  each  member  of  this  subfamily  are 

absolutely  continuous  with  respect  to  the  same  o— finite  measure.   Let  £[•] 

be  the  expectation  under  the  true  distribution  F  „,   and  let  dF  „  and  dF  „ 
^  zO  z9         zO 

be  the  densities  with  respect  to  the  common  dominating  measure,  and  dz 

integration  with  respect  to  that  measure.   Let  T     denote  a  set  of  paths  such 

2 
that  for  each  one  there  is  a  random  variable  S  (z)   with  E[S  (z)  ]  <  oo  and 

0  Q 

J[e-^dF-^-dF^^^)  -  is^(z)dF^^^,^dz  -.0,   as  e  -^  0. 

Here  S  (z)   is  a  "mean-square  version"  of  the  score  51n(dF  a^/dQ\    ^^ 
6  Zw      0—0 

associated  with  the  path,  which  quantifies  a  direction  of  departure  from  the 
truth  allowed  by  ?.   The  requirement  that  ?  be  unrestricted  is  formalized 
in  the  condition  that  there  is  a  set  of  paths  T     with  associated  set  of 
scores  !f     satisfying  the  following  property: 

2 
Assumption  2.1:   y  is  linear  and  for  any  s(z)  with  E[s(z)]  =  0,   E[s(z)  ] 

2 
<  00,   and  any  c  >  0  there  is  S„(z)  €  y  such  that  E[{s(z)-S^(z) }  ]  <  e. 

0  0 


That  is,  the  mean-square  closure  of  the  set  of  scores  is  all  mean-zero  random 
variables,  i.e.   ^  allows  for  any  direction  of  departure  from  the  truth. 

The  functional  ^l{F    )   is  pathwise  different iable   if  there  is  a  mapping 
ji_,(S  )  :  y  — )  IR   that  is  linear  and  mean-square  continuous  with  respect  to 

the  true  distribution  (i.e.  for  every  e  >  0   there  exists  5  >  0  such  that 

2 
ll/V(S  )ll  <  E   if  E[S  (z)  ]  <  6),  such  that  for  each  path, 

(2.2)    e"■^[^x(F^g)  -  /jfF^Q)]  ^ 'V^^e^  ^^  e  ^  o, 

i.e.  the  derivative  from  the  right  of  /i(F   )   at  the  truth  (6  =  0)  is 

Z6 

^L,(S  ).   The  linearity  and  mean-square  continuity  of  fx  (S  ),   Assumption  2.1, 
and  the  Riesz  representation  theorem  imply  the  existence  of  a  unique  (up  to 
the  usual  a. s.  equivalence)  random  vector  d(z),   the  pathwise  derivative, 
such  that  E[d(z)]  =  0,   E[d(z)^]  <  co,   and 


(2.3)     Mr-(S^)  =  E[d(z)S„(z)]. 
r   0  y 


Under  Assumption  2.1  and  with  i.i.d.  data  the  asymptotic  variance  bound  for 
estimators  of  /i(F  )   is  E[d(z)d(z) ' ]  ■   Hence,  the  formula  for  the  asymptotic 
variance  of  ^  suggested  here  is  the  variance  of  the  pathwise  derivative  of 
the  functional   3   is  a  nonparametric  estimator  of. 

A  stronger  justification  for  regarding  the  pathwise  derivative  of  fi(F  ) 
as  a  correct  formula  for  the  asymptotic  variance  of  p   is  available  when  ^ 
is  asymptotically  equivalent  to  a  sample  average.   Define  3  to  be 
asymptotically  linear   with  influence  function  i//(z)   if  at  the  truth, 

(2.4)   Vn{^  -   j3„)  =  y.",i//(z.  )/v^  +  o  (1),   E[i//(z)]  =  0,   Var(i//(z))  finite. 
0    ^1=1    1        p 

This  condition  is  satisfied  by  many  semiparametric  estimators,  under 
sufficient  regularity  conditions.   For  i.i.d.  data,  asymptotic  linearity  and 


the  central  limit  theorem  imply  3  is  asymptotically  normal  with  variance 

Var(f/»(2)).   Define  p  to  be  a  regular   estimator  of  /i(F  )   if  for  any  path  in 

T,      and  6  =  0(l/v^),   when  z.   has  distribution  F„  ,  Vn{^-^l(F^  ))   has  a 
n  1  0  6 

n  n 

limiting  distribution  that  does  not  depend  on  {6  }  _   or  on  the  path. 
Regularity  is  the  precise  condition  that  specifies  that  ^  is  a  nonparametric 
estimator  of  fi(F  ),   because  it  requires  that  ^  is  asymptotically,  locally 
consistent  for  /j(F  ). 

Theorem  2.1:      Suppose   that     z   ,z   ....    are  i.i.d,      ^     is  asymptotically  linear 
and  regular  for     T,      and  Assumption  2.  1    is  satisfied.      Then     \i(F   )      is 
pathwise  different iable  and     \l)(z)  =  d(z). 

The  thing  that  seems  to  be  novel  here  is  the  idea  of  applying  this  result 
to  the  functional  (i(F  )   that  is  nonparametrically  estimated  by  p.   The 
fact  that  asymptotic  linearity  and  regularity  imply  pathwise  differentiability 
follows  by  Van  der  Vaart  (1991,  Theorem  2.1),  and  the  fact  that  Assumption  2.1 
implies  that  there  is  only  one  influence  function  and  that  it  equals  the 
pathwise  derivative,  is  a  small  additional  step  that  has  been  discussed  in 
Newey  (1990a). 

This  result  can  also  be  used  to  detect  whether  an  estimator  is 
Vn-consistent.   As  shown  by  Van  der  Vaart  (1991),  if  equation  (2.2)  is 
satisfied  but  fip,(S  )   is  not  mean-square  continuous  (i.e.   d(z)   satisfying 
equation  (2.3)  does  not  exist)  then  no  V^-consistent,  regular  estimator 
exists.   For  example,  the  value  of  a  density  function  at  a  point  does  not  have 
a  mean-square  continuous  derivative,  and  neither  does  the  functional  that  is 
nonparametrically  estimated  by  Manski's  (1975)  maximum  score  estimator.   The 
pathwise  derivative  does  not  help  in  finding  the  asymptotic  distribution  (at  a 
slower  than  v^  rate)  of  such  estimators,  which  can  be  quite  complicated: 


e.g.  see  Kim  and  Pollard  (1989). 

The  hypotheses  of  Theorem  2. 1  are  not  primitive,  but  the  point  of  Theorem 

2. 1  is  to  formalize  the  statement  that  "under  sufficient  regularity 

conditions"  the  influence  function  of  a  semiparametric  estimator  is  the 

pathwise  derivative  of  the  functional  that  is  nonparametrically  estimated  by 

p.       In  Sections  3  and  4,  this  result  and  some  pathwise  derivative  calculations 

are  used  to  derive  propositions  about  semiparametric  estimators.   These 

results  are  labeled  as  "propositions"  because  primitive  conditions  for  their  - 

validity  are  not  given  in  Sections  3  and  4.   They  might  also  be  labeled  as 

"conjectures,"  although  this  word  does  not  convey  the  same  sense  that  the 

validity  of  the  results  only  requires  regularity  conditions.   In  Sections  3 

and  4,  the  solution  to  equation  (2.3)  is  calculated  using  the  chain  rule  of 

calculus,  differentiation  under  integrals,  integration,  and   3ja(z)dF  /39l 

9     0— U 

=  E[a(z)S  (z)]   for  a(z)   with  finite  mean  square  where  ever  needed,  and  then 
in  Sections  5-7  conditions  for  implied  remainder  terms  to  be  small  are 
given.   This  approach,  with  formal  calculation  followed  by  regularity 
conditions,  is  similar  to  that  used  in  parametric  asymptotic  theory  (e.g.  for 
Edgeworth  expansions),  and  is  meant  to  illustrate  the  usefulness  of  the 
pathwise  derivative  calculation. 


3.    Semipcirametric  M-Estimators 

The  rest  of  the  paper  will  focus  on  a  class  semiparametric  m-estimators, 
obtained  from  moment  conditions  that  can  depend  on  estimated  functions.   Let 
m(z,p,h)   be  a  vector  of  functions  with  the  same  dimension  as  fS,      depending 
on  a  data  observation  z  and  a  vector  of  unknown  functions  h.   Let  hO) 


denote  an  estimator  of  h,   with  corresponding  m(z,p,hO)).   A  semiparametric 
m-estimator  p   is  one  which  solves  an  asymptotic  moment  equation 

(3.1)  Y,^^^m{z^,p,hW))/n  =   0. 

The  general  idea  here  is  that  P     is  obtained  by  a  procedure  that  "plugs-in" 
an  estimated  function  hO),   that  can  depend  on  p. 

An  early  and  important  example  is  the  Buckley  and  James  (1979)  estimator 
for  censored  regression.   Other  examples  are  Robinson's  (1988)  semiparametric 
regression  estimator  and  Powell,  Stock,  and  Stoker's  (1989)  weighted  average 
derivative  estimator.   For  a  new  example,  consider  a  semi-linear  model  with 
additive  nonparametric  component,   E[y|x,v]  =  x'P  +  p, (v  )  +  p  (v  ),   where 
V  =  (v  ,v  ).   The  motivation  for  this  model  is  that  if  v  is  high 
dimensional  the  asymptotic  properties  of  P     could  be  adversely  affected  if 
additivity  of  p, (v  )  +  p  (v  )   is  true  but  not  imposed:  see  Section  4  for 
further  discussion.   Assume  that  the  set  of  additive  functions  in  v   and  v 
with  finite  mean-square  is  closed  in  mean  square,  and  let  n(»|v  ,v  )  denote 
the  mean-square  (Hilbert  space)  projection  on  this  set.   Also,  let  fl(«|v  ,v  ) 
denote  an  estimator  of  this  projection,  such  as  the  series  estimator 
considered  in  Stone  (1985)  and  in  Section  6,  or  the  alternating  conditional 
expectation  estimator  in  Breiman  and  Friedman  (1985).   Consider 

(3.2)  p  =  argminp{j:^2i  f^i  ~  ''i'^  "  ^(v^,  p)  ]^/2>, 
h(v,p)  =  lt(y|Vj,V2)  -  f[ix\v^,v^)'p. 

This  is  a  semiparametric  m-estimator  with  m(z,p,h(p)) 
=  [x  +  ah(v,p)/ap][y  -  x'fi  -   h(v,p)]. 

It  is  possible,  at  the  level  of  generality  of  equation  (3.1),  to  derive  a 


number  of  propositions.   To  use  the  pathwise  derivative  formula  in  this 
derivation,  it  is  necessary  to  identify  the  functional  that  is 
nonparametrically  estimated  by  ^.   Let  hO,F)   denote  the  limit  of  hO) 
for  a  general  distribution  F  =  F  ,   where  the  z  subscript  is  suppressed 
henceforth  for  notational  convenience.   By  the  usual  method  of  moments 
reasoning,  the  limit  /i(F)   of  ^  for  a  general   F  should  be  the  solution  to 

(3.3)     Ep[m(z,fi,h(ti,F))]  =  0. 

That  is,  equation  (3.1)  sets  ^  so  that  sample  moments  are  zero,  and  the 
sample  moments  have  a  limit  of  E  [m(z,  p,  h{/3,  F) )  ]   (by  the  law  of  large 
numbers  and  hO,F)   equal  to  the  limit  of  hO)),   so  that  g   is  consistent 
for  that  value  of  fi     that  sets  the  population  moments  to  zero. 

Before  computing  the  pathwise  derivative,  it  is  interesting  to  note  that 
it  will  depend  only  on  the  limit  hO,F),   and  not  on  the  particular  form  of 
the  estimator  hO).   Thus,  different  nonparametric  estimators  of  the  same 
functions  should  result  in  the  same  asymptotic  variance.   For  example,  this 
reasoning  explains  why  replacing  the  kernel  estimator  of  Robinson  (1988)  by 
series  estimators  gives  an  asymptotically  equivalent  estimator,  as  shown  by 
Newey  (1990b),  and  suggests  that  for  estimation  of  the  additive  model  above, 
the  distribution  is  invariant  to  the  estimator  of  the  projection.   Also,  two 
estimators  may  not  be  asymptotically  equivalent  if  the  nuisance  functions 
estimate  different  objects  nonparametrically. 

Proposition  1:      The  asymptotic  variance  of  semiparametric  estimators  depends 
only  on   the  function   that    is  nonparametrically  estimated,    and  not   on   the   type 
of  estimator   (such  as  kernel   or  series  nonparametric  regression). 

To  obtain  more  results,  it  is  useful  to  be  more  specific  about  the  form 


of  the  pathwise  derivative.   Suppose  that  h  has  J  components,   h  = 

(h  ,  ...,h,).   For  a  path  F  ,   equal  to  the  truth  when  9  =  9=0,   let 
1       J  9  U 

E^[«]   denote  the  expectation  with  respect  to  F  ,   h.O,9)  =  h.O,F  ),   h.O) 
6  9    J        J    9     J 

=  h.(|3,9^),   h.(9)  =  h.O  ,  9),   and  let  the  same  expressions  without  the  j 
subscript  denote  corresponding  vectors.  For  a  path,  ^i(9)   will  be  the 
functional  satisfying  the  parametric  version  of  equation  (3.2), 

(3.4)     E^[m(z,ji,h(ji,9))]  =  0. 


Then  for  m(z,h(9))  =  m(z,P  ,h(9)),  differentiation  gives 

5E^[m(z,h(9-))]/S9l   =  /m(z,  h(9- ) ) [SdF  /S9]dz | ^  =  E[m(z, h(9- ) )S^(z) ] . 

0  0 

Then,  applying  the  chain  rule  to  E  [m(z,h(9  ))],   it  follows  that 

9  2 

1 

9Eg[m(z,h(9))]/a9|g  =  E[m(z,h(9Q) )Sg(z) ]  +  aE[m(z,h(9) ) ]/59 1^  . 

0  0 

Assuming  D  =   9E[m(z,p,  hO,  9„) )  ]/9^|  „   is  nonsingular,  the  implicit  function 

°       ^0 

theorem  gives 

(3.5)  5M(e)/S9l   =  -D"^{E[m(z,h(9_))S^(z)]  +  SE[m(z, h(9) ) ]/5e I   }. 

0  .0 

The  first  term  is  already  in  outer  product  form  of  equation  (2.3),  so  that  the 
pathwise  derivative  will  exist  if  the  second  term  can  be  put  in  a  similar 
form.   Suppose  there  are  a.(z)   such  that  for  each  j  =  1 J, 

(3.6)  SE[m(z.h,(9-) h.(9) h,(9.))]/S9L  =  E[a  .(z)S^(z)  ] . 

iU         J  JU         D         Jo 

Then,  applying  the  chain  rule  to  E[m(z, h  (6  ),..., h . (0  .),..., h  (9  )) ]   with 
each  9.  equal  to  9,   it  follows  that 


10 


SE[m(z.h(G))]/ael   =  J].:!,5E[m(z.h,  (e.) h.O) .  h  ,(6, ) )  ]/Se  I 

=  l.i.ElaAz)SAz)]    =  El il.i. a  Az)}S{z)], 

J-l      J        «  J-i   J         D 

giving  the  outer  product  form.   Then,  moving  -D    inside  the  expectation, 

it  follow  that  the  pathwise  derivative  is  d(z)  =  -D   [m(z,h(e  ))  + 

{J] ._  a  .  (z) } ] ,   so  that  by  Theorem  2.1  the  influence  function  of  ^  equals 

(3.7)     ipiz)    =   -D"^'{m(z,pQ,hOQ))  +  l.-i^[ccAz)    -   E[a.(z)]]}. 


This  influence  function  has  an  interesting  structure.   The  leading  term 
-D  m(z,P^,hO  ))   is  the  usual  Huber  (1967)  formula  for  the  influence 
function  of  an  m-estimator  with  moment  functions  m(z,^,h(^)),   i.e.  the 
formula  that  would  be  obtained  if  hO)   were  equal  to  hO).   Thus,  the 
second  term  is  an  adjustment  term  for  the  estimation  of  hO),   a 
nonparametric  analog  of  adjustments  that  are  familiar  for  two-step  parametric 
estimators.   It  can  also  be  interpreted  as  the  pathwise  derivative  of  the 
functional.  D  E[m(z,  p  ,  hO  ,  F) )  ] ,   or  as  the  influence  function  of 
D  Jm(z, p  , hO  ) )dF(z) .   Furthermore,  the  adjustment  contains  exactly  one  term 
for  each  component  of  h,   and  the  j    adjustment  can  be  interpreted  as 

the  pathwise  derivative  of  E[m(z,PQ,hAp^) h  .  O^,  F) ,  .  .  .  ,  h^O^) )  ] .   This 

property  is  useful,  because  the  adjustment  terms  can  be  calculated  for  each 
function  h.,   holding  the  other  functions  fixed  at  their  true  values,  and 
then  the  total  adjustment  formed  as  the  sum.   For  this  reason  the  j 
subscript  will  be  dropped  in  the  rest  of  Sections  3  and  4,  with  the 
understanding  that  the  results  can  be  applied  to  individual  h.  terms,  and 
then  combined  to  derive  the  total  adjustment  (e.g.  when  some  adjustment  terms 
are  zero  and  others  are  not). 
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It  is  useful  to  know  when  an  adjustment  term  is  zero.   In  such  cases,  it 
should  not  be  necessary  to  account  for  the  presence  of  h(/3),   i.e.   hO)   can 
be  treated  as  if  it  were  equal  to  hO),   greatly  simplifying  the  calculation 
of  the  asymptotic  variance  and  finding  a  consistent  estimator  of  it.   One 
case  where  an  adjustment  term  will  be  zero  is  when  equation  (3.1)  is  the 
first-order  condition  to  a  maximization  problem,  and  hO)  has  a  limit  that 
maximizes  the  population  value  of  the  same  function.   To  be  specific,  suppose 
that  there  is  a  function  q(z,P,hO))   and  a  set  of  functions  HO),   possibly 
depending  on  P  but  not  on  the  distribution  F  of  z,   such  that 

(3.8)  m(z,p,hO))  =  Sq(z,p,hO))/5|3,  hO,F)  =  argmaxp^^^^j^^^Ej,[q(z,  p,  hO) )  ] . 

The  interpretation  of  this  condition  is  that  m(z,p,hO))   are  the  first  order 

conditions  for  a  stationary  point  of  the  function  q  and  that  hO,F) 

maximizes  the  expected  value  of  the  same  function,  i.e.  that  hO,F)   has  been 

"concentrated  out."   Then  for  any  parametric  model  F  ,   since  hO,e)  = 

0 

hO,F„),   it  follows  that  E[q(z,  P,  hO,  6) )  ]   is  maximized  at   6..   The  first 
0  (J 

order  conditions  for  this  maximization  are  5E[q(z,  p,  hO,  0) )  ]/5e  |   =  0, 

Q 
0 

identically   in  p.   Differentiating  again  with  respect  to  3, 

(3.9)  0  =  a^E[q(z,p,hO,0))]/seapi   =  aE[aq(z,p,h(/3,0))/5p]/a0i 

0  o 

0  0 

=  aE[m(z,p,hO,e))]/a0i  . 

0 

0 

Evaluating  this  equation  at  p  ,   it  follows  that  a(z)  =  0  will  solve 
equation  (3.6),  and  hence  the  adjustment  term  is  zero.   Summarizing: 

Proposition  2:      If  equation  (3.8)   is  satisfied,    then  the  estimation  of     h     can 
be  ignored  in  calculating  the  asymptotic  variance,    i.e.    it   is  the  same  as  if 
h(p)   =  h(li). 
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Examples  of  estimators  that  satisfy  the  hypotheses  of  this  proposition 
are  those  of  Robinson  (1988),  Ichimura  (1987),  Klein  and  Spady  (1987),  and  Ai 
(1990).   A  new  example  is  the  additive  semi-linear  estimator  of  equation 
(3.2).   Suppose  that  the  set  H     of  additive  functions  is  closed  under 
any  F  e  ^  and  is  invariant  to  F,   and  let   IT  ('Iv  ,v  )   denote  the 

projection  under  F.   Then  hO)   is  a  nonparametric  estimator  of  IT  (y|v  ,v  ) 

2 
-  Tl^{x\w^,w^)'^  =  Tl^{y-x'P\w^,v^),      which  minimizes  Ep[(y  -  x'^  -  h(v,p))  ], 

the  same  objective  function  minimized  by  the  limit  of  p.      Therefore,  by 

Proposition  2,  estimation  of  IT  (y|v  v  )   and  IT  (x|v  v  )   should  have  no 

r      1   Z  r      1   Z 

effect  on  the  asymptotic  variance  of  p.   Thus,  for  c   =   y-x'P  -p  (v  )-p  (v  ), 
the  formula  for  the  influence  function  is 

(3.10)     0(z)  =  (E[{x-n[xlv^,V2]}{x-n[x|v^,V2]}' ])~-^{x-n[x|v^,V2]}c. 


Primitive  conditions  for  this  result  are  given  in  Newey  (1991),  and  somewhat 
weaker  conditions  could  be  formulated  using  the  results  of  Section  6. 

There  is  another,  more  direct  condition  under  which  estimation  of  the 
nuisance  function  does  not  affect  the  asymptotic  variance.   To  formulate  this 
condition,  suppose  that  m(z,h)   depends  on  h  only  through  the  value  it 
takes  on  as  a  function  h(v)   of  a  subvector  v  of   z,   i.e.   h   is  a  real 
vector  argument  in  m(2,h).   The  additive  semi-linear  example  has  this 
property  if  h(p)   is  redefined  to  include  IT[x|v  ,v  ].   Let  h(v,9)  denote 
the  limiting  value  of  h(v,p  )  for  a  path.   For  M(z)  =  Sm(z,p  ,h)/Sh|    ,  ,, 
differentiation  gives 

(3.11)   aE[m(z,h(e))]/ae|^  =  E[M(z)ah(v,e)/aeL  ]  =  aE[M(z)h(v,e)]/aei  . 

0  0  0 

If  the  term  on  the  right-hand  side  is  zero,  then  a(z)  =  0  will  solve 
equation  (3.6),  and  the  adjustment  term  is  zero.   One  simple  condition  for 
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this  is  that  E[M(z)|v]  =  0.   More  generally,  the  adjustment  term  will  be  zero 
if  h(v,e)   is  an  element  of  a  set  to  which  M(z)   is  orthogonal. 

Proposition  3:      If     E[H(z)\v]   =  0,      or  more  generally     h(v,F)      is  an  element 
of  a  set     H     such   that     E[M(z)h(v)]   =  0     for  all      h  &  H,      then  estimation  of 
h     can  be  ignored  in  calculating  the  asymptotic  variance. 

The  semi-linear,  additive  model  is  also  an  example  here. 

In  cases  where  the  correction  term  is  nonzero,  its  form  will  depend  on 

the  limit  of  hO).   Therefore,  it  is  difficult  to  give  a  general 

characterization  of  the  correction  term.   One  result  that  does  not  depend  on 

completely  specifying  the  form  of  h  can  be  obtained  in  semiparametric  models 

where  the  data,   z, ,  ...,  z   are  i.i.d.   and  z.   is  restricted  to  have  a 
In  1 

density  function  of  the  form  f(zO,g),   where  g   is  a  nonparametric 
(functional)  component.   Let  S  (z)  =  51nf(z|p  ,g(T)))/97}|    denote  the  score 

for  a  finite-dimensional  parameterization  of  g  with  S^Vj^)   =  gp,.   (g^  is 
the  truth),  and  let  S-(z)  =  Slnf (z|^,g„)/S^| „  .   Also,  let  A  denote  a 

constant  matrix  with  number  of  rows  equal  to  the  number  of  elements  of  p. 
The  tangent  set  J      is  defined  as  the  mean-square  closure  of  the  set  of  all 
linear  combinations  AS  (z).   The  tangent  set  is  useful  in  calculating  the 
asymptotic  variance  bound  for  estimators  of  p  in  the  semiparametric  model 
f(z|p,g).   The  form  of  this  bound  is  7  =  (E[S(z)S(z)' ] )~\   where  S(z)  = 
S  (z)-n(S  (z)  IS")  and  Tl{'\J)      denotes  the  mean-square  projection  on  the 
tangent  set.   See,  for  example,  Newey  (1990a)  for  further  discussion. 

Under  certain  conditions,   a(z)  =  -IT(m(z)|3')   will  solve  equation  (3.5), 
for  m(z)  =  m(z,p  ,  hO  )),   so  that  the  correction  term  can  be  calculated  from 
this  projection.   Let  6  be  the  parameter  of  an  unrestricted  path,  as 
discussed  in  Section  2  (6  does  not   have  anything  to  do  with  /3  or  t)). 
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Suppose  that  there  is   g(0)   such  that, 


(3.12)    Jm(z,h(e))f(z|pQ.g(e))dz  =  0. 


In  words,  for  the  limit  of  hO  )   under  a  general  distribution  there  is  a 

corresponding  value  of  the  nonparametric  component  of  the  semiparametric  model 

where  the  population  moment  conditions  (corresponding  to  equation  (3.1))  are 

satisfied.   Let  S  „(z)  =  Slnf. (z |g(e) )/99,   and  note  that  S  „(z)   is  an 
ge  0   ^       '  g9 

element  of  the  tangent  set,  implying  E[m(z)S  q(z)]  =  E[n(m(z) |3')S   (z)]. 

g9  go 

Suppose  that  S  „(z)  =  IKS  (z)]?).   Then  differentiating  equation  (3.12)  with 
ge        e 

respect  to  6, 


5E[m(z,h(e))]/Sel   =  -E[m(z)S  ^(z)]  =  -E[m(z)n(S^(z) l^")  ] 
0  ® 

=  -E[n(m(z)|3-)n(S^(z)|3-)]  =  E[-n(m(z)  |Sr)S^(z)  1 . 


Thus,  under  the  previous  conditions,   a(z)  =  -n(m(z)|3^)   satisfies  equation 
(3.6).   Summarizing: 


Proposition  4:      If  for  all   unrestricted  paths     F  there  exists     g(6)     such 

ZQ 

that   equation   (3.12)    is  satisfied,    and     dlnf(z\^^,g(B))/dQ  =  W(S(z)\'5), 
then     a(z)   =   -\[(m(z,^^,h(^^))\'5 )     and   the   influence  function  of     ^     is 
-D"^[m(z,pQ.hOQ))  -  n(m(z,PQ,hOQ))|3-)]. 


This  form  of  the  correction  term  has  previously  been  derived  by  Bickel, 
Klaassen,  Ritov,  and  Wellner  (1990)  and  Newey  (1990a).   The  contribution  of 
Proposition  4  is  to  give  a  general  formulation  for  this  result  in  terms  of  the 
pathwise  derivative  calculation  developed  in  Section  2. 

This  result  leads  to  a  sufficient  condition  for  asymptotic  efficiency  of 
a  semiparametric  m-estimator,  that  the  hypotheses  of  Proposition  4  are 
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satisfied  and  m(z)  =  S„(z).   In  this  case,   m(z)  +  a(z)  -  E[a]  =  S^Cz)  - 

P  p 

IKSqCz)!?)  =  S(z).   Furthermore,  any  semiparametric  m-estimator  that  is 
P 

regular  under  the  semiparametric  model  f(z|p,g)   and  has  influence  function 
-D~  (m(z)+a(z)-E[a] )  will  satisfy 


(3.13)    D  =  -E[(m(z)+a(z)-E[a])S(z)'], 


as  discussed  in  Newey  (1990a).   Thus,  the  influence  function  of  p  is 
(E[S(z)S(z)' ] )  S(z)  =  VS[z),      with  corresponding  asymptotic  variance 
VE[S{.z)S(.z)'  ]V  =   V,   which  equals  the  lower  bound. 


4.   Functions  of  Mean-Square  Projections  and  Densities 

In  this  section,  the  form  of  the  correction  term  is  derived  when  the 
nuisance  functions  are  linear  functions  of  conditional  expectations  or  other 
mean-square  projections,  such  as  additive  or  partially  linear  regressions, 
and  for  densities.   Let  y  be  a  random  variable  with  finite  second  moment  and 
X  an  r  X  1  vector.   Let  §■  denote  a  linear  set  of  functions  of  x  that 

is  closed  in  mean-square  and  g(x)   denote  the  least  squares  (Hilbert-space) 

2 
projection  of  y  on  x,   that  is  g(x)  =  argmin-  „E[(y-g(x))  ].   One  h(v) 

considered  in  this  section  will  be  h(v)  =  A(g,v),   where  A  is  a  linear 

function  of  g,   and  v  is  a  subvector  of  z. 

The  simplest  nonparametric  example  of  a  projection  is  g(x)  =  E[y|x], 

where  W     is  all  measurable  functions  of  x  with  finite  mean-square.   A  more 

general  example  is  a  projection  on 
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with  each  x.  a  subvector  of  x.   This  is  a  smaller  set  of  functions,  whose 
consideration  is  motivated  partly  by  the  difficulty  of  estimating  conditional 
expectations  for  x  with  many  dimensions;  e.g.  see  Stone  (1985)  for 
discussion  and  references.   Here,  where  g(x)   is  a  nuisance  function, 
important  reasons  to  avoid  high  dimensional  nonparametric  regressions  are  that 
a  projection  on  a  larger  set  of  functions  than  that  to  which  g(x)   belongs 
will  lead  to  higher  asymptotic  variances  for  P   in  some  cases,  as  noted  in 
Newey  (1991),  and  will  lower  the  rate  at  which  remainder  terms  converge  to 
zero,  affecting  accuracy  of  the  asymptotic  normal  approximation. 

The  correction  term  is  derived  first  for  the  simplest  case,  where  h(v)  = 

2 
g(x).   Let  g(x,9)  =  argmin~  ^^^^^V  ~  E^^^^    1   denote  the  projection  of  y 

on  ^  for  a  path.   Note  that  for  the  vector  of  projections  of  elements  of 

M(z)   on  '§,      5(x)  =  n(M(z)|^),   it  follows  that  E[M(z)g(x,  6)  ]  =  E[5(x)g(x,  9)  ] 

identically  in   6.   Also,  by  6(x)  €  §■,   E  [6(x)g(x,  6)  ]  =  E  [5(x)y],  so  by 

s  s 

the  chain  rule, 

(4.2)      E[M(z)ag(x,0Q)/S9] Ig  =  5E[M(z)g(x.9)]/S9|g  =  SE[6(x)g(x,9)]/S9|g 

0  0  0 

=  {3EQ[6(x)g(x.e)]/ae  -  aE^[s{x)gix]]/ae}\^ 

0 

=  SE„[6(x){y-g(x)}]/a9l   =  E[6(x){y-g(x) }S^(z) ]  I   . 
o  y  y     y 

0  0 

Equation  (4.2)  implies  the  next  result. 

Proposition  5:      If     h(v)  =  g(x)      is   the  projection  of     y     on     "§ ,      then  the 
correction  term  is     a.(z)  =  Tl(H(z)\'§) [y-g(x)] . 

A  new  example  is  an  estimator  for  a  semiparametric  random  effects  model. 
Let  (y. ,x  )  (t=l,2),  be  sets  of  observations  for  two  time  periods,  where  y 
is  binary,  and  suppose  that  for  x  =  (x  ,x  ),   E[y  |x]  =  $(  [x  ^  +p(x)  ]/cr   ) , 
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<r       =   1,   p(x)   is  an  unknown  function,  and  $  denotes  the  standard  normal 
CDF.   This  is  a  binary  panel  data  model  with  y  =  1 (x  P   +  a  +  e   >  0)   for 

an  individual  effect  a,   and  the  conditional  distribution  of  a  +  e   given 

2 

X  is  N(p(x),cr   ).   This  model  generalizes  Chamberlain's  (1980)  random 

effects  model  by  allowing  the  conditional  mean  of  a  to  be  unknown.   In 
contrast  to  Manski's  (1987)  semiparametric  individual  effects  model,   e   is 
allowed  to  be  heteroskedastic  over  time,  but  the  conditional  distribution  of 
a  +  E   is  restricted  to  be  Gaussian. 

An  implication  of  this  model  is  that 

(4.3)     *"-^(E[y^|x])  =  <r^^i>~^(.E[y^\x])    +  U^-x^)^^^. 

This  implication  can  be  used  to  construct  a  semiparametric  minimum  distance 
estimator  by  replacing  the  conditional  expectations  with  nonparametric 

estimators  h  (x)  =  E[y  |x]   and  choosing  p   and  o"   from  the  least  squares 

-1  -  -1  - 

regression  of  $   (h, (x.))   on  x, .-x„.   and  $   (h„(x.)).   This  estimator  can 

1   1         li   2i  2   1 

also  be  generalized  to  the  case  where  the  distribution  of  disturbances  is 
unknown,  by  normalizing  the  scale  of  ^     and  replacing  $    by  a  series 
approximation  to  the  unknown  inverse  marginal  distribution  functions,  although 
further  development  of  this  estimator  is  beyond  the  scope  of  this  paper. 

To  derive  the  influence  function  of  the  estimator  of  IB       and  cr  ,   note 
that  it  is  a  semiparametric  m-estimator  with  p  =  O'er  )',   v  =  x, 
m(z,^,h(v.^))  =  A(x,h2)[$"-^(h^(x))  -  ^'"^  (.h^ix))o-^  -   (x^-x^)^],   and  A(x,h2)  = 
[x  -X  , $   (h  (x))]'.   Here,  the  correction  terms  are  the  only  source  of 
variation,  since  m(z,3  ,h(v,p  ))  =  0.   Also,   D  =  -E[A(x, h  )A(x,h  )' ]   and 
M.(z)  =  A(x,h  ){-l)'^~-^<^($~^(h.(x)))~^   Then  by  Proposition  5,   a.(z)  = 

J  ^  J  J 

A(x,h  )(-lcr  )J"V(4'~^(h.(x)))"^[y.-h.(x)]. 

•^  <i  J  J      J 
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(4.4)  0(z)  =  -D  ^[a^(z)+a2(z)]  =  -D  ^Atx.h^)  • 

{^(*"^(h^(x)))"^y^-h^(x)]  -  (r^(p{i~'^{h^U)))~'^ly^-h^U)]}. 

Proposition  5  can  be  generalized  to  linear  functionals  of  the  projection 
that  have  a  particular  property  specified  in  the  following  assumption. 

Assumption  4.1:   h(v)  =  A(v,g),   A(v,g)   is  a  linear  function  of  g,   and 
there  is  6(x)  e  &  such  that  for  all   g(x)  e  §■, 

(4.5)  E[M(z)A(v.i)]  =  E[5(x)i(x)]. 

By  the  Riesz  representation  theorem,  equation  (4.5)  is  equivalent  to  assuming 
that  the  functional   E[M(z)A(v, g) ]   is  mean-square  continuous  in  g.   This 
condition  is  necessary  for  XM(z)h(v)dF(z)   to  be  a  v^-consistently  estimable 
functional  of  h(v),   as  discussed  in  Newey  (1991),  so  that  the  estimation  of 
h(v)   will  affect  the  convergence  rate  of  P  unless  Assumption  4. 1  is 
satisfied.   Thus,  for  h(v)   a  linear  function  of  g(x).   Assumption  4.1  and 
the  form  of  the  correction  term  given  below  characterize  the  adjustment  for 
mean-square  projections. 

Equation  (4.5)  leads  to  a  straightforward  form  for  the  correction  term. 
Noting  that  h(v,e)  =  A(v,g(e)),   differentiation  gives 


(4.6)  E[M(z)ah(v.e)/se] I   =  5E[M(z)h(v,e)]/aei   =  SE[M(z)A(v,g(e))]/aei 

0  0  0 

0  0  0 

=  aE[5(x)g(x,e)]/ae|^  =  E[5(x){y-g(x)}S^(z)]. 

o  o 

0 
where  the  last  equality  follows  as  in  equation  (4.2). 


Proposition  6:      If  Assumption  4.1    is  satisfied,    the  correction   term  is     oc(z) 
8(x)[y-g(x)]. 
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In  order  for  this  result  to  provide  an  interesting  formula,  it  must  be 
possible  to  find  6{x).   In  a  number  of  cases  6(x)   takes  a  projection  form 
similar  to  that  of  Proposition  5.   One  interesting  case  is  that  where  x  = 
(x  ,x  ),   X   may  be  a  vector,   v  =  x  ,   and  h(v)  =  A(v,g)  =  J.g(x  ,x  )dx  . 
In  this  case,   E[M(z)h(v)]  =  E[E[M(z)  |v]h(v)  ]  =  E[J^[M(z)  Ix^lgCxjdx^]  = 
E[l(x^6^)f(x^|x2)"^E[M{z)|x2]i(x)]  =  E[n(l (x^€^)f (x^ |x2)"^E[M{z} Ix^] l^)g(x)  ] , 
where  f (x  |x  )   is  the  conditional  density  of  x   given  x  . 


Proposition  7:      If     h(v)  =  S^(x  ,x  )dx  ,      x       is  absolutely  continuous  wi 


th 


respect    to   the  product   measure  corresponding   to     dx       and   the  distribution  of 

-1 
X       with  density     f(x    \x^),      and     1  (x  €j4)f  (x,  Ix-, )   E[M(z)|x-,]  has  finite 

second  moment,    then   the  correction   term  is     8(x) [y-g(x)]      for     8(x)   = 

n(l(x^€^)f(x^|x2)~^E[M(z)|x2]|g). 


An  example  is  average  approximate  consumer  surplus,  where  x   is  a  price 

^  n  b'^ 

variable  and  p  =  J^._  J  g(x  ,  x   )dx  /n,   which  is  a  semiparametric  m-estimator 

with  M(z)  =  1.   By  Proposition  8,  the  influence  function  for  this  estimator 

will  be 

(4.7)  ^(z)  =  J^g(x^,X2)dx^  "  ^0  ■"  ^(l(aix^£b)f(x^|x2)"^^)[y  -  g(x)]. 

Results  for  exact  consumer  surplus  (i.e.  equivalent  variation)  and  where  the 
demand  function  is  a  nonlinear  function  of  a  projection  (e.g.  log-linear 
models)  are  analyzed  in  Hausman  and  Newey  (1991). 

Another  case  where  5(x)   takes  a  projection  form  is  where  h(v)   is  a 

J" 
derivative  of  a  projection  evaluated  at  some  other  variable  v.   For  x  e  IR  , 

and  a  vector  \-   (A A  )'   of  nonnegative  integers,  let   |A|  =  E-=i'^- 

and  denote  a  partial  derivative  by 

(4.8)  D'^g(x)  =  ^''^'g(x)/ax^^ooo^x^^ 
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Suppose  that  h(v)  =  Z)'^g(x).   Let  M[v)  =  E[M(z)|v]   and  f  (v)   and  f  (x)   be 

V  X 

the  densities  of  v  and  x  respectively,  with  respect  to  the  same  dominating 
measure.   Assuming  that  v  has  a  density  that  is  differentiable  to  sufficient 
order  in  the  components  of  v  corresponding  to  nonzero  components  of   X, 
with  zero  derivatives  on  the  boundary  of  its  support,  repeated  integration  by 
parts  gives  E[M(z)h(v)]  =  J>l(v)D'^g(v)f  (v)dv  =  (-1 )  ' '^' J'D'^[M(v)f  (v)]g(v)dv  = 
{_l)l^lj-C^[M(v)f  (v)]|    g(x)dx  =  E[(-l)'^'f  (x)~^/[M(v)f  (v)]    g(x)]  = 

V       V=X  X  V     v=x 

E[5(x)g(x)],   S(x)  =  (-l)'^'n(f  (x)"^D^[M(v)f  (v)]    |^). 

XV     v=x 

Proposition  8:      If     h(v)   =  D  g(x)\        ,      v     and     x     are  absolutely  continuous 

x=v 

with  respect    to   the  same  measure,    which   is  Lebesgue  measure  for   the  components 

X     of     X     corresponding   to  nonzero  components  of     X,       the  density     f   (v)     and 

E[M(z)\v]      are  continuously  differentiable   to  order      |X|  in     x,      the  support 

of     X      is  a  convex  set   with  nonempty   interior,    and  for  each     X  s  X,  D  f   (v) 

is  zero  on   the  boundary  of   the  support   of     x     and  f   (x)     D   [M(v)f   (v)]  has 

XV  v=x 

finite  second  moment,    then   the  correction   term  is     S(x) [y-g(x)]      for     S(x)   = 

C-i;''^'n(f  (x)~-^D^[H(v)f   (v)]         \'§). 
X  V  v=x 


An  example  with  no  derivatives  involved  is  Stock's  (1989)  nonparametric 
prediction  estimator,  where  ^  =  {g. (x  )  +  x'tj},   so  that  g(x)   is  a 
partially  linear  projection,   v  =  (v  x  )   is  partitioned  conformably  with  x, 


and 


P  =  7-  , [g(v. )-g(x. ) ]/n.   This  is  a  semiparametric  m-estimator  where  S„ 


=  E[g(v)]-E[g(x)],   h^(v)  =  g(v),  h^Cx)  =  g(x),   and  M^(z)  =  M^Cz)  =  1. 

From  the  form  of  the  correction  terms  in  Proposition  8,  the  influence  function 

of  Pq  is 

(4.9)     0(z)  =  g(v)-g(x)-pQ  +  [n(f^(x)~V^(x)|^)  -  l][y  -  g(x)]. 


This  result  differs  from  Stock's  in  the  inclusion  of  the  term  g(v)-g(x)-p  , 
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because  Stock's  result  only  derived  the  conditional  distribution  given  the 
observations  on  x  and  v.   The  variance  of  the  second  term  is  the  same  as 
Stock's  formula,  because  n(f  (x)~-^f  (x)|g)  =  f  (x,  )~-^f  (x,  )  + 

XV  X     1      V     1 

1       1 

U^-E[x^\x^])E[War{x^\x^)]~'^Elx^-Elii^\x^]^^^  ],      for  f^U^)      and  fy(v^) 

11  1  1 

equal  to  the  densities  of  x   and  v   respectively.   Proposition  8  also 

gives  the  form  of  correction  terms  for  the  dynamic  discrete  choice  estimators 

of  Ahn  and  Manski  (1989)  and  Hotz  and  Miller  (1989),  and  the  average 

derivative  estimator  of  Hardle  and  Stoker  (1989). 

A  correction  term  for  density  estimation  can  be  derived  under  conditions 

similar  to  those  for  the  projection.   Suppose  that  h(v)  =  A(v,f  ),   where 

w 

A(v,f  )  is  a  linear  function  of  the  density  f  (w)   of  a  vector  w,   with 
w  w 

respect  to  some  measure.   Suppose  that  there  is  a(w)   such  that 

E[M(z)A(v,f  )]  =  J'a(w)f  (w)dw.   Let  f  (w|e)  denote  the  density  of  w  for  a 

WW  w 

path.   Then 


E[M(z)ah(v,e)/5e] I   =  aE[M(z)A(v,f  (e))]/ae|^ 

O  Wo 

0  0 

=  aE^[a(w)]/Sel   =  E[a(w)S^(z)]. 
y         y         y 

0 

Proposition  9:      If     h(v)  =  A(v,f  )     for   a  density     f       and  there  is     a(v) 
such   that     E[H(z)A(v,  f   )]   =  So.(v)f   (v)dw     then   the  correction   term  is 


a(w)-E[a.(w)]. 


Existence  of  such  a  a(w)  will  follow  from  the  Riesz  representation  theorem 

2 

if  J"f  (w)  dw  is  finite  and  E[M(z)A(v,f  )]   can  be  extended  to  a  linear 
w  w 

functional  on  the  Hilbert  space  of  square  integrable  (dw)  functions  that  is 
continuous.   Continuity  of  E[M(z)A(v,f  )]   in  f  ,   in  the  square  integrable 
sense,  appears  to  be  essentially  necessary  for  the  correction  term  to  be 
V^-consistent,  although  it  is  difficult  to  give  a  precise  result,  because  the 
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usual  parameterization  for  checking  v'n-consistency  is  the  square  root  of  the 
density,  rather  than  density  itself. 

A  case  where  is  it  easy  to  compute  a  density  correction  term  a(w)   is 
that  with  w  =  (y',x')   and  h(v)  =  D    [Ja(y)f  (w)dy] |  _  .   Integration  by 
parts  gives  E[M(z)A(v,f  )]  =  jM(v)D'^[J'a(y)f  (w)dy]  |    f  (v)dv  = 

W  W  X=V  V 

(_l)l^lj-£)^[M(v)f  (v)]|  _  [Ja(y)f  (w)dy]dx  =  Ja(w)f  (w)dw,   a(w)  = 
V      V— X        w  w 

{-l)''^'D^[M(v)f  (v)]|   a(y). 

V  v=x 

Proposition   10:      If     h(v)   =  D  Sa(y)f   (y,x)dy\    _   ,      v     and     x     are  absolutely 

continuous  with   respect    to   the  same  measure,    which   is  Lebesgue  measure  for   the 

components  x     of     x     corresponding   to  nonzero  components  of     X,  the  density 

f   (v)      and     E[H(z)\v]      are  continuously  differentiable   to  order      |A|  in     x, 

the  support   of     x      is  a  convex  set   with  nonempty   interior,    and  for  each     X   ^ 

X,  D  f   (v)      is  zero  on   the  boundary  of   the  support   of     x     and 

D   [N(v)f   (v)]\        a(y)     has  finite  second  moment,    then   the  correction   term  is 
V  v=x 

then   the  correction   term  is     a(w)   -  Ela(w)]      for     aCw)   = 

(-l)^^^D^[H(v)f   (v)]\         a(y). 

V  v=x 


This  result  gives  the  form  of  the  correction  term  for  Powell,  Stock,  and 
Stoker's  (1989)  weighted  average  derivative  estimator  and  Robinson's  (1989) 
test  statistics.   Another  example  is  Ruud' s  (1986)  density  weighted  least 
squares  estimator,  which  is  treated  in  Newey  and  Ruud  (1991). 

There  may  be  other  interesting  cases  where  the  form  of  the  correction 
term  can  be  calculated.   Hopefully,  the  ones  given  here  illustrate  the 
usefulness  of  the  pathwise  derivative  calculation  of  the  influence  function. 
In  the  next  two  Sections,  regularity  conditions  for  the  validity  of  many  of 
these  calculations  are  given. 
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5.    Regularity  Conditions. 

This  Section  develops  a  set  of  regularity  conditions  that  are  sufficient 
for  validity  of  the  pathwise  derivative  formula.   The  regularity  conditions 
are  based  on  direct  verification  that  remainder  terms  from  the  pathwise 
derivative  are  small. 

The  rest  of  the  paper  will  focus  on  semiparametric  generalized  method  of 

moments  estimators  where  h(v)   does  not  depend  on  parameters.   Let  m(z,^,h) 

be  a  vector  of  functions  of  the  data  observation  z,   the  q  x  1   parameter 

vector  ^  and  a  J  x  1   vector  h,   where  h  represents  a  possible  value  of 

a  vector  of  functions  h(v)  =  (h  (v  ),..., h  (v  )) '   and  each  v.   is  a  vector. 

Also,  assume  that  the  moment  condition  E[m(z. , P  , h(v. ) ) ]  =0   is  satisfied. 

Note  that  this  setup  allows  h(v)   to  include  parameter  values,  by  specifying 

that  some  v.  are  trivial  (can  only  take  on  one  value).   For  example,  some 

elements  of  h(v)   might  be  trimming  parameters,  as  in  Newey  and  Ruud  (1991). 

Let  h(v)  denote  an  estimator  of  this  vector  function,   m  (S)  = 

n 

y.    ,m(z. , p, h(v. ) )/n,   and  W  a  positive  semi-definite  matrix.   The  estimator 
to  be  analyzed  satisfies 


(5.1)     p  =  argmin^  ^m  0)'Wm  (p). 

p€t!  n      n 


Although  h  is  not  allowed  to  depend  on  p  in  this  section,  the  results 

are  still  useful  for  the  general  case,  because  they  provide  conditions  for  the 

important  intermediate  result  that  V.    ,m(z. ,  6.,  h(v. ,  6. ) )/v'n  is 

^1=1   1  0    1  0 

asymptotically  normal.   This  result  will  follow  as  a  special  case  by 

letting  p  =  I^^i'^^^i'^o'^^^i'^O^^^'^- 

Because  of  the  importance  of  asymptotic  normality  of  J^._  m(z. , h(v. ) )/V^ 


(for  m(z,h)  =  m(z,p  , h)),  and  because  this  function  is  the  source  of  the 
correction  terms,  it  is  useful  to  discuss  this  result  first  and  organize  the 
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discussion  around  a  few  high-level  conditions.   The  pathwise  derivative 
calculation  is  very  useful  in  formulating  these  conditions,  because  it  gives 
the  form  of  a  remainder  term  that  should  converge  in  probability  to  zero, 
implying  asymptotic  normality.   Let  a.(z)   be  the  solution  to  equation  (3.5), 

(J  =  1 J),  and  a(z)  =  J^._  a.(z).   Then,  from  the  form  of  equation  (3.7) 

one  would  expect  that  the  following  remainder   term  R   should  converge  in 
probability  to  zero: 

R  =  y;.",m(z.,h(v.))/Vn  -  F.^^u./Vn,   u.  =  m(z.,h(v.))  +  a(z.  )  -  E[a(z)]. 
n   ^1=1   1    1       ^1=1  11      11        1 

If  R  -^  0,  then  asymptotic  normality  of  Y.    ,m(z. , h(v. ) )/V^   will  follow 
n  ^  -^     ^1  =  1    1    1 

from  the  central  limit  theorem  applied  to  J^._  u./Vn. 

To  give  conditions  for  R    to  be  small  it  is  helpful  to  decompose  this 

remainder  term.   For  M(z)  =  9m(z,h)/3h|,  ^,    ,,   let  M.(z)   denote  the   j 

h=htv)         J 

column  of  M(z)   and 

R^  =  I.",{m(z.,h(v.))  -  m(z.,h(v.))  -  M(z.)[h(v.)  -  h(v.)]}/Vn. 
n   ^1=1    11        11        11       1 

R^.  =  j:.'^.{M.(z.)[h.(v.)  -  h.(v.)]  -  a.(z.)  +  E[aAz)]}/Vn. 
nj   ^1=1   J   1   J   1     J   1      J   1       J 

Note  that  R  =  R^  +  E.'^.R^..   so  that  R  -^  0   if  each  of  the  following 
n    n   ^j=l  nj'  n  ^ 

conditions  is  satisfied. 
Asymptotic  Linearity:   R  -^   0. 

Asymptotic  Differentiability:   R  .  -^  0,   (j  =  1 J). 

Asymptotic  linearity  is  similar  to  a  condition  formulated  in  Hardle  and 

Stoker  (1989),  and  will  follow  from  a  Taylor  expansion  and  a  sample 

-1/4 
mean-square  convergence  rate  for  h  of  slightly  faster  than  n    ,   as 

discussed  below.   Asymptotic  differentiability  is  a  deeper,  more  important 
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condition.   It  can  be  shown  to  hold  if  h.(v)   is  a  kernel  estimator  of 

J 

J'a(y)f (y, x)dy  using  U-statistic  projection  results  and  higher-order  bias 
reducing  kernels,  as  in  Powell,  Stock,  and  Stoker  (1989)  and  Robinson  (1988), 
or  if  h.(v)   is  a  series  estimator  using  properties  of  sample  projections  and 
series  approximations,  as  in  Newey  (1990b)  and  Section  6.   It  is  also  possible 
to  further  decompose  the  asymptotic  differentiability  remainder  in  a  way  that 
allows  application  of  Andrews  (1990b)  stochastic  equicontinuity  results.   Let 

R^^.  =  5:.",{M.(z.)[h.(v.)   -  h.(v.)]    -  J>l.(z)[h.(v)   -  h  .(v)]dF(z)}/i/S. 
nj       ^1=1      Jiji  ji  J  J  J 

R^^  =  Vn{XM.(2)[h.(v)    -  h.(v)]dF(2)    -  l.^Aoi.U.)   -  E[a .  (z)  ]  }/n}. 

Note  that  R^  .  =  R^^.  +  R^^,   so  that  R^  .  -^  0  if  each  of  the  following 
nj    nj    nj  nj 

conditions  is  satisfied. 

Stochastic  Ecpiicontinuity:   R  .  -^  0. 

Functional  Convergence:   R  .  — >  0. 

nj 

Conditions  for  stochastic  equicontinuity  are  given  below.   Functional 
convergence  is  specific  to  the  form  of  h(v).   One  interesting  result  is  that 
if  E[M.(z)|v]  =  0,   then  functional  convergence  holds  trivially  for  a.(z)  = 
0.   Thus,  asymptotic  linearity  and  stochastic  equicontinuity  are  regularity 
conditions  for  Proposition  3,  as  further  discussed  below.   When  a.(z)   is  not 
zero,  functional  convergence  may  follow  from  asymptotic  normality  of 
mean-square  continuous  linear  functionals  of  h(v),   since  functional 
convergence  is  only  slightly  stronger  than  asymptotic  normality  of 
v^{jE[M.(z)|v]th.(v)  -  h.(v)]dF(z). 

Some  of  these  high  level  conditions  will  be  consequences  of  more 
primitive  hypotheses.    The  first  of  these  limits  the  dependence  between 
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observations  that  are  far  apart. 


Assumption  5.1:   z.   is  strictly  stationary  and  strong  (a)  mixing  with  mixing 
coefficients  oc{t)   =  0(t~^)   for  fi  >  2. 


The  next  condition  is  uniform  consistency  of  h. 


Assumption  5.2:   For  each  j   and  the  support  V  .     of     v  ., 

sup   ,,  |h.(v.)-h.{v.)|  -^  0. 
v^el/j  J  J   J  J 

Primitive  conditions  for  this  and  the  other  assumptions  about  h  are  given  in 
Section  6.   The  following  pair  of  hypotheses  are  more  primitive  conditions  for 
Asymptotic  Linearity.   The  first  imposes  smoothness  conditions  on  m.   For  a 
random  variable  Y   let   |Y|   =  (E[  |  Y|  ^]  )  ^''^,   and  for  any  €  >  0   let  Jf(v,€) 
=  {h:  llh-h(v)ll  <  €>. 


Assumption  5.3:   |m(z,  p  ,  h(v) )  |  ,   is  finite  for  some  s'  >  2fi/(fi-l)   and 
there  is  a  neighborhood  N     of  p   €  >  0,  b.  (z)  >  1,  &.    (1  s   j+k  ^   2), 
&  .  i  2,   6   i  2  such  that  with  probability  one  m(z,p,h)   is  twice 
continuously  differentiable  on  Nx'Hiv,e), 

The  next  hypothesis  imposes  a  convergence  rate  on  h. 

O  At 

Assumption  5.4:   i)  for  some  h,    for  each  J,  Y.    , |h  .  (v. )-h .  (v. ) |  /n  =  o  (n  ): 

^1  =  1   J   1   J   1         p    ' 

ii)  either  m  is  linear  in  h  or  h  ^   -&  -(1/2). 


Assumption  5.4  is  stated  in  terms  of  the  sample  L   norm  rather  than  a  more 
general  norm  because  the  literature  on  convergence  rates  of  nonparametric 
estimators  seems  to  give  the  sharpest  results  for  this  norm. 
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Lemma  5.1:      If  Assumptions  5.2  -  5.4  are  satisfied  then  Asymptotic  Linearity 
is  satisfied. 

The  following  condition  is  sufficient  for  stochastic  equicontinuity. 


Assumption  5.5   |M(z)|  ,  <  oo  for  s'  >  2/i/(/i-2)  and  for  each  j,   there  is  a 

set  V.      such  that  E[l(v  .el/" .)  IIM  .(z)  II  ]  =  0  and  either;  a)  V.      is  a  singleton, 
J  J   J   J  J 

or;   b)   V   is  convex  with  Prob(Boundary(V .) )  =  0  and  there  is  a  positive 

integer  d.   >  diin(v.)/2  such  that  h.(v)   is  continuously  differentiable, 
J        J  J 

with  bounded  derivatives,  to  order  d. .     on  V .     and  for  all   |A|  £  d., 

J       J  J 

sup^  ^^  |D'^h(v.)-D\(v.)|  -^  0. 

Lemma  5.2:      If  Assumptions  5.1   and  5.5  are  satisfied,    then  Stochastic 
Equicontinuity  is  satisfied 


Although  the  main  focus  here  is  asymptotic  distribution  theory,  for 
completeness  it  is  appropriate  to  give  a  consistency  result.   The  next 
hypothesis  imposes  identification  and  regularity  conditions  for  consistency. 
Let  p:  IR  ^  IR   be  continuous  at  zero,  with  p(0)  =  0. 

Assumption  5.6:   E[m(z,  (3,  h(v) )  ]  =  0  has  a  unique  solution  at  P  ,   W  -^   W, 

W  is  positive  definite,  and  either  a)  m(z,P,h(v))   is  convex  in  /3  with 

probability  one,  for  each  j3  e  SB,   E[llm(z,  |3,  h(v) )  II  ]  <  cd,   there  is  b(z}  and 

such  that  E[b(z)]  <  oo  and  sup,  -,,    Jlm(z,  p,h)-m(z, /3,  h{v) )  II  £   b(z)p(€),  or; 

nSH  IV,  G  J 

b)  B  is  compact,   m(z,p,h(v))   is  continuous  in  p,   there  is  b(z)  and 
p(e)   continuous  at  zero  such  that  E[b{z)]  <  oo,   sup  „llm(z,  3,  h(v) )  II  ^  b(z), 
^"PpeB  h€K(v  e)l'"'(z.'3.h)-m(z,p,h(v))ll  :£  b(z)cP. 

Theorem  5.3:      If  Assumptions  5.1   -5.2  and  5.6  are  satisfied   then     0  -^  ^   . 
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Next,  regularity  conditions  for  Proposition  3  are  given.   Let  m.  = 

Theorem  5.4:      If  Assumptions  5.1-5.6   are  satisfied  and  for  each     J     either 

E[H  .(z)  \v  .]   =  0     or,    more  generally     h  .(v  .)     and     h  .(v  .)     are  elements  of  a 
J  J  J     J  J     J 

set     U.     such   that     E[M  .(z)h  .(v  .)]   =  0     for  all     h  .  e  K  .,  then  for     Q  = 
J  J  J     J  J  J 

E[m.m'.]   +  y„   ,E[m.m'.    „  +  m.  „m'.]. 

^i^-p    )    -^  N(0,V),   V  =  (D'WD)~'^D'WnWD(D'WD)~'^. 


This  theorem  shows  that  Andrews  (1990a)  "independence"  hypothesis,  that 
estimation  of  h(v)   does  not  affect  the  limiting  distribution  of  ^,   is  a 
consequence  of  orthogonality  of  M(z)   with  the  set  of  possible  h(v). 

The  next  asymptotic  normality  result  allows  for  a  nonzero  correction 
term.   Let  n^  =  E[u.u^^^],   ^  =  n^  +  l^l^iQ^+Q'^) . 

Theorem  5.5:      If  for  some     s'    >  2^/(y.-l),       |a.Czj|  ,  is  finite,       (J  =   1,    ... 
J),      Assumptions  5.1    -  5.4,    5.6,    and  Asymptotic  Differentiability  are 
satisfied,    then     v^(p-p  )  -^  N(0,V),   V  =  (D' WD)~-^D' WnWD(D' WD)~^ 

A  consistent  estimator  n  of  n   is  required  to  form  a  consistent 

estimator  of  the  asymptotic  variance  of  p.   Such  Q  can  be  formed  from 

estimates  u.   of  u..   Let  a.,   denote  estimates  of  a.(z.)   and 
1       1         Ji  J   1 

m.  =  m.  (z.  ,p,h(v.  ) ),   u.  =  in.  +  Y-   ,\.a...-Y.    ,a../n],   n„  =  Yv  ,u.u'  „/n. 
1    1   1  '^'    1   *    1    1   ^.]=1   .11  ^1=1  11       i       ^1=1  1  i+£ 


If  u.  is  not  autocorrelated  then  n  =  Q  will  be  an  appropriate  estimator. 
When  u.  may  be  autocorrelated,  consider  a  weighted  autocovariance  estimator 
like  that  in  Newey  and  West  (1987),  with 
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n  =  ^Q  +  l^^^u,L)iQ^  +  n^], 


where  w(£,L)   is  a  weight  such  that  Q     is  positive  semi-definite,  such  as 
w(£,  L)  =  1  -  £/(L+l).   Here  L  can  depend  on  the  data,  as  is  important  for 
applications.   Given  this  estimator  of  Q,      an  estimator  of  the  asymptotic 
covariance  matrix  of  ^  can  be  formed  in  the  usual  way,  as 

V  =  (D'WD)   D'WnWD(D'WD)   . 


Theorem  5.5:      Suppose   that   Assumptions  5.1   and  5.3  are  satisfied  with     &_  = 
6,^  =  2,      there   is     s  >  4u./(ll-2)     such   that      |m.|    and      \a  .(z.)[        are  finite 

10  r-   r-  ^  ^  J        X        s 

for  each     J,      w(l,L)      is  bounded  uniformly  in     I     and     L     and     lim,        w(i,L)   = 

1      for  each     i,       \\p-pj\   =  0  (l/Vn).    there   is  e     =  o(l)   such   that    1/n   =  0(&^), 

Up  n  n 

\h  .(v)-h  .(v)\      =  0  fe   ;,      y.",lla  ..-a  .Cz  Jll^/n  =  0  (^ )     and  either  a)  Q,  =  0, 
J  J         CO         p     n         ^1  =  1      ji     J     1  p     n  I 

i  ^   1,    and     n  =  n^;      or     b)     L  -^  co,      and     L  =  o   (&~'^).      Then,      V  -^  V. 
u  p    n 


As  usual  for  minimum  distance  estimators,  the  asymptotic  variance  depends 
on  W,   and  an  optimal  (asymptotic  variance  minimizing)  choice  of  W  is  n 
when  n     is  nonsingular.   The  estimator  Q     can  be  used  to  form  a  feasible 
version  of  the  optimal  minimum  distance  estimator,  by  using  W  =  n    in 
equation  (5.1).  The  resulting  estimator  will  be  an  optimal  estimator  that 
adjusts  for  the  presence  of  first-stage,  plug-in  estimators  in  the  moment 
functions,  similarly  to  the  estimator  of  Hansen  (1985).   For  this  choice  of 
W,   (D'n  D)    will  be  a  consistent  estimator  of  the  asymptotic  variance  of 
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6.    Series  Estimation  of  Projection  Functionals. 

This  Section  develops  regularity  conditions  for  linear  functionals  of 
power  series  estimators  of  projections.   Power  series  are  considered  because 
they  are  computationally  convenient  and  the  most  complete  convergence  rate 
results  seem  to  be  available  for  them.   Although  in  some  contexts  power  series 
are  thought  to  be  inferior  to  other  approximating  functions,  because  of  their 
"roly-poly"  behavior  and  global  sensitivity  of  best  uniform  approximations  to 
singularities,  these  considerations  may  not  be  as  important  here,  where  the 
projection  estimator  is  a  nuisance  function.   Under  Assumption  4.1,   p 
depends  on  the  series  approximation  essentially  only  through  a  weighted 
average,  where  these  problems  with  power  series  seem  not  to  be  so  important. 
An  example  is  provided  by  the  Monte-Carlo  results  of  Newey  (1988a),  where  a 
semiparametric  power  series  estimator  performs  extremely  well  relative  to  a 
kernel  estimator. 

Here,  the  domain  &  of  the  projection  will  be  assumed  to  take  the  form 
in  equation  (4.1).   The  conditions  to  follow  will  depend  on  the  maximum  across 
£  i  L  of  the  dimension  of  x„,   which  will  be  denoted  by  a.   A  power  series 
estimator  of  the  projection  can  be  obtained  from  a  regression  of  y  on  a 
truncated  power  series  with  elements  restricted  to  lie  in  W,      analogous  to 
Stone's  (1985)  spline  estimator.   Let   X  denote  a  vector  of  nonnegative 

X  T-  ^f  CO 

integers  as  before,  and  let   x  =  IT._  x.  .   For  a  sequence   (/VCk))      of 
distinct  such  vectors,  a  power  series  is 


(6.1)     pj^(x)  = 


X     ,  k  =  1,  . . . ,  dim(x   ) 
L+l,k  L+l 


A  (k-s )   ,      , .   /•  ~    \     , 

X      .  k  =  dim(x   )+l, 

L  +  l 


A  (k ) 
It  will  be  assumed  that   {x    },  j.  ,~   ,   consists  of  all  multivariate 

k>dim{x   ) 

L+l 
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powers  of  each  x.  for  £  ^  L,   and  no  more,  ordered  so  that   |A.(k)|   is 
monotonic  increasing.   This  assumption  imposes  the  essential  restriction  that 
each  p.  (x)   belongs  to  W,      and  the  spanning  condition  that  the  sequence 
includes  all  such  terms. 

The  estimator  of  the  projection  considered  here  is  that  obtained  from  the 

least  squares  regression  of  y  on  K  terms,  where  K  is  allowed  to  depend 

K  K 

on  the  data.  For  y  =  (y y  )',   p  (x)  =  (p.(x) p  (x))',   and  p  = 

K  K 

[p  (x  ) p  (x  )],   the  estimator  of  g(x)   is 

(6.2)     i(x)  =  p^(xrn,  n   =  t'p^'y,      t  =   p'^'p^^/n 


where   (•)   denotes  a  generalized  inverse.   Under  the  conditions  to  follow, 

K  K 
p  'p   will  be  nonsingular  with  probability  approaching  one,  so  that  the 

choice  of  generalized  inverse  does  not  matter,  asymptotically. 

A  data  based  K  is  essential  for  making  operational  the  nonparametric 
properties  of  series  estimators,  allowing  the  estimator  to  adjust  to 
conditions  in  particular  applications.   It  would  also  be  interesting  to  know 
how  to  best  choose  K  in  the  current  context,  but  this  question  is  outside 
the  scope  of  this  paper. 

For  computational  purposes  it  may  be  useful  to  replace  p  (x)  with 
nonsingular  linear  transformation  to  polynomials  that  are  orthogonal  with 
respect  to  some  distribution,  since  these  may  have  less  of  a  multicollinearity 
problem  than  power  series  are  known  to  have.   Of  course,  this  replacement  will 
not  affect  the  estimator.   Also,  note  that  the  elements  of  each  x.  may  be 
smooth,  bounded  transformations  (e.g.  the  logit  distribution  function)  of 
"original"  variables,  which  may  help  to  limit  the  sensitivity  of  the  estimator 
to  outliers.   In  the  Monte  Carlo  example  of  Newey  (1988a),  such  a 
transformation  lead  to  reduced  sensitivity  to  the  choice  of  K. 
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The  function  h(v)   that  appears  in  the  moments  will  be  taken  to  be  a 
linear  function  A(v,g)   of  the  projection  g,   as  analyzed  in  Section  4,   and 
h(v)  will  be  estimated  by  replacing  g  by  g  in  the  function.   By  linearity 
of  A   in  g,   the  resulting  estimator  takes  the  form 


(6.3)     h(v)  =  A(v,g)  =  A(v)'7t,   A(v)  =  (A(v,p^) A(v,p~))'. 


Note  that  this  estimator  requires  that  A(v,g)  have  an  explicit  form  that 
does  not  depend  on  the  true  data-generation  process. 

An  estimator  of  the  correction  term  is  required  for  estimation  of  the 
asymptotic  variance  of  p.  Under  Assumption  4.1,  such  an  estimator  can  be 
constructed  in  a  straightforward  way.   Let 

(6.4)     a(z)  =  *'f:"p^(x)[y  -  g(x)],   *  =  Y.^^^ldm(z^,^,hiv^))/ah]Mv ^)/n. 

By  Assumption  4.1  i     will  be  an  estimator  of  J6(x)p  (x)dF(x),   so  that 
*'£  p  (x)   is  an  estimator  of  the  regression  of  6(x)   on  p  (x),   which  will 
approximate  6(x)   for  large  K  and  n.   Alternatively,   a(z)   can  be  viewed 
as  the  estimator  of  the  correction  term  obtained  by  treating  g(x)   as  if  it 
were  a  parametric  regression,  with  K  fixed.   This  procedure  results  in  a 
consistent  estimator  of  the  correction  term  because  it  accounts  properly  for 
its  variance,  while  bias  from  the  series  approximation  will  be  small  because 
of  smoothness  restrictions  on  g(x)   and  5(x)   imposed  below. 

The  following  conditions  are  needed  to  apply  the  results  of  Newey  (1991). 

Let  X  denote  the  vector  consisting  of  the  union  of  all  distinct  variables 

s,      L    ~         ~  2 
appearing  in  x„,   (£  £  L),   and  let  t  =  '(Eo^ig/^^/^  •  ^^^A^f^    ^    *~  "^-   ^^® 

first  condition  is  sufficient  for  §■  to  be  closed. 
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Assumption  6.0:   i)   For  each  x^,   (£  =  1,  ....  L),   if  x   is  a  subvector 
of  x^  then  x  =  x„,   for  some  I'    £  L;   ii)  There  exists  a  constant   c  >  1 
such  that  for  each  i,      with  the  partitioning  X  =  (x^.x.')',   for  all  a(q)  ^ 
0,   cJa(q)d[F(x^)-F(xJ)]  £  E[a(q)]  ^   c"Va(q)d[F(x^) -FCxJ)  ] ;   iii)  Either  i) 


=  0   (i.e.  X     is  not  present)  or  x     is  bounded  and  for  the  closure 
of  t,   E[{x^^^-n(x^^j|f)}{x^^^-n(x^^^|f)>']   is  nonsingular. 

The  next  condition  requires  that  the  support  of  X  be  a  box  and  places  a 
lower  bound  on  its  distribution. 


Assumption  6.1:   There  are  finite  X.   >  X.,,   v .  ^  0,  (j  =  1,  ...,  dim(X)) 

such  that  the  support  of  X   is  n  -  ^   [X.  ,X..  ]   and  the  distribution  of 

II  j=l    ju'  jb 

X  has  absolutely  continuous  component  with  density  bounded  below  by 


■§ 


Cn.  ,[(X.  -X.){X.-X.,  )]   on  the  support. 
"j=l   JU  J   J   Jb 


The  nonsingularity  condition  is  a  normalization,  unless  7)   is  a  parameter  of 
interest,  where  it  is  an  identification  assumption  for  t\   .      Let  e  =  y-g(x). 

2 
Assumption  6.2:   |c|  ,   is  finite  for  s'  i  2   and  E[e  |x]   is  bounded. 

The  bounded  second  conditional  moment  assumption  is  quite  common  in  the 
literature  (e.g.  Stone,  1985),  and  simplifies  the  regularity  conditions. 

Assumption  6.3:   Either  a)   z.   is  uniform  mixing  with  mixing  coefficients 
0(i)  =  0(t~^),   (t  =  1,  2,  ...),   for  /i  >  2  or;   b)  there  exists  c(t)   such 
that   lEf^i^^i+il'^i.^i+^J  I  -  c(t)   and  E^"jC(t)  <  co. 


This  assumption  is  restrictive,  but  covers  many  cases  of  interest,  including 

independent  observations  and  dynamic  nonparametric  regression  with  g(x.)  = 

E[y. |x.,y.  .,x.  ,,y.  ^,...].   The  next  condition  restricts  the  amount  of 
1   1-^1-1   1-1  ■'1-2 

variation  allowed  in  the  choice  of  number  of  terms  K. 
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Assumption  6.4:   K  =  K  such  that  with  probability  approaching  one,   K  ^  K  ^  K 
where  K  =  K(n)   and  K  =  K(n)   are  sequences  of  constants,  and  there  is  e  > 
0  such  that  K(n)  ^  n   for  all  n  large  enough. 


The  bounds  K  and  K  control  the  bias  and  variance,  respectively,  of  g. 
The  next  set  of  conditions  impose  smoothness  assumptions  used  to  control 
the  bias  of  the  estimator. 

Assumption  6.5:   g.  (x.)   is  continuously  differentiable  to  order  d,     (£  ^  L). 

Two  results  will  be  given,  because  the  conditions  are  weaker  and  simpler 
in  the  special  case  where  h(v)  =  g(x),   meriting  Its  separate  treatment.   For 
any  nonnegative  integer  d   let 

(6.5)      C^(K)  =K-^"^*2d_ 

The  covariance  matrix  of  ^  can  be  estimated  by  the  procedure  discussed  in 
Section  5,  using  the  estimators  of  a.(z)   given  above.   The  asymptotic 
distribution  results  will  include  consistency  of  this  variance  estimator. 

Theorem  6.1:      Suppose   that   Assumpt ions  5.1,    5.3,    5.6,    6.0-6.5  are  satisfied, 

s' 
and  for  each     J,      i)     s  >  ^ii/Cii-Z),      ElWH  .(z)\\      ]      is  finite  for  some     s'    > 

2 
^\l/(\1l-2)  and     E[\\M  .(z)-5  .(x)\\    \x]      is  bounded;      ii)  S  .(x)     is  continuously 

differentiable   to  order     d       on     x;      Hi)   each  of   the  following  converge   to 

o 

zero:  K^C^(K)^/n.      K^^^^CK)^-"^  ^"^ ,      K^^^^^(K)k'^^'' ,      v^'^^^^S^^'^;  iv) 

either     m(z,^^,h)     is  linear  in     h     or     K/Vn  +  VnK  =  o(n       O2);    v)  e     = 

n   K   Cq(K)(K   /vn  +  K    )+n   K5   =  o(l)     and  either  a)   Q.  =  0,    £  > 

1,    n  =   n.;  or  b)     L  -^   00,  and     L  =  o   (s'h.      Then  Vn(^-f3^)   -^  N(0,V)     and 
u  p    ri  u 

V  -^V. 
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The  upper  bound  on  the  rate  of  growth  for  the  number  of  terms  in  the  series 

1/4 
expansion  is  n     when  the  density  of  X  is  bounded  away  from  zero  (v  =  0), 

1/2 
which  is  less  than  the  n     rate  derived  in  Newey  (1990b).   This  result  also 

requires  existence  of  derivatives  of  both  g(x)  and  6(x)  up  to  order  more 

than  the  largest  dimension  of  an  additive  component,  as  in  Newey  (1990b). 

To  obtain  asymptotic  normality  in  the  more  general  case,  where  h(v)  = 

A(v,g)   is  some  other  linear  function  of  g,   it  is  useful  to  impose  a 

continuity  condition  on  A(v,g)  as  a  function  of  g.   Let  V     denote  the 

support  of  V,   and  denote  supremum  Sobolev  norms  by 

"^^^^"d  =  =^Pui^d,vel/l^''h^^^l'   "S^^^"d  =^"P|A|sd,xl^^^(X)l. 

Assumption  6.6:  There  is  a  constant  C  and  an  integer  A  such  that 
IIA(v,g)llQ  :£  Cllg(x)ll^. 

This  Assumption  will  imply  that  the  bias  from  approximating  the  function 
h  (v)  by  a  linear  combination  of  A(v)   is  bounded  by  the  bias  of 
approximating  g(x)  and  its  derivatives  to  order  A  by  a  linear  combination 
of  p  (x).   Unfortunately,  for  multivariate  functions,  a  literature  search  has 
not  yet  revealed  bias  bounds  for  approximating  a  functions  and  derivatives  by 
power  series,  except  under  part  b)  of  the  following  condition. 

Assumption  6.7:   Either  a)   a  =  1   or  A  =  0  or;  b)   for  each  £,   g.  (x.) 

is  continuously  differentiable  to  all  orders,  and  there  is  a  constant  C  such 

that  sup^^^|Z)\^(x^)|  £   c''^'   for  all  X. 


Condition  b)  implies  an  approximation  rate  for  g(x)   and  its  derivatives  that 
is  faster  than  K    for  any  a. 

When  K  is  random,  it  is  useful  to  also  have  an  approximation  rate  for 
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6(x).   In  order  that  the  results  would  apply  to  many  cases,  with  more  general 
approximation  results  that  one  anticipates  will  appear  eventually,  the  rest  of 
the  results  of  this  section  will  be  stated  in  terms  of 

e^(K)  =  min  |5(x)-p*^(x)'7i|„. 
on  ^ 

Let   X  =  X(n)  =  tK.  K]. 


Theorem  6.2:    Suppose   that   Assumptions  4.1.    5.1.    5.3.    5.6.    6.0-6.7  are 

satisfied  and   let      a.     =  d/n,     and     a  -   (d/n.)-L     under   Assumption  6.7  a)   and     a. 

=  a  =  +00  under  Assumption  6.7  b).      Also  suppose   that   for  each     j. 

|m(2,p  h(v))|  ,   and  \6  .(x) [y  .-g  .(x)\    ,      are  finite  for     s'    >  ^[i/(^i-Z).    and 
^  s  J  J       J  ^ 

i)   Z,/KC.CK;^K~'^"  -^  0     and     }^nK'°'oejK)   -^  0;      ii)   Either  a)   z.  is  uniform 

mixing  or     b)   J^^ft  ^  CK^  K       o  — >  0;  Hi)   Either  a)     m(z.h)      is   linear   in     h 

and     €  =  K^^^i:jK)K~°-   +  n''^[K{:jK)+   K^^^C-CTc/j/v^  +  [l^eJK)^]^^^    -^  0.    or 

b)     yrMcAK)^lK/n+YC^"-l   =  o(l)     and     e   =  n^^l^^^C^jDll^^^/^n   +  k""' 7  + 
A         -  n  A  - 

n^^¥p^^C,^(K)^/V?i   +  flj^^^CK)^]^^^   -^  0;      iv)     either  a)   n^  =  0.    I  ^   1 .    h  = 

fi_;  or  b)     L  -^  <x>.      and     L  =  o   (e~^).      Then     Vn(^-(i^)   -^  N(O.V)      and     V   -^ 
(J  P  n  u 

V. 


The  smallest  upper  bound  on  the  number  of  terms  allowed  by  this  theorem  is 

n   ,   for  V  =  0     and  s  =  oo,   a  rate  also  derived  in  Newey  (1991).   This 

result  also  requires  existence  of  derivatives  of  g(x)   of  order  more  than 

3/2  the  maximum  dimension  of  the  additive  components,  but  imposes  weak 

smoothness  restrictions  on  5(x). 

The  requirement  that  the  bias  go  to  zero  faster  than   1/v^,   as  needed 

for  V^-consistency,  is  that  v'i^  °''oe.(K,)     converge  to  zero.   This  term  is  the 

o 

product  of  i/n,   the  approximation  rate  for  g(x)   (i.e.  the  bias  from 
estimating  g(x)   by  truncated  series),  and   the  approximation  rate  for   5(x). 
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Consequently,  "iindersmoothing"  is  not  required  for  asymptotic  norma''  ty  ►■ '  th 
"plug-in"  series  estimators,  as  it  is  for  some  kernel  estimators,  because  the 
bias  from  estimating  h(v)  does  not  have  to  go  to  zero  faster  than  1/v^. 
This  result  is  a  consequence  of  the  usual  orthogonality  property  of 
mean-square  projections.   Let  gv-(^)  and  ^jf^^)  be  the  population  regression 
on  p  (x)   of  g(x)   (or  y)   and  6(x),   respectively,   and  hxr(v)  =  A(v,g  ) 
the  corresponding  value  of  h^(v).   Using  Assumption  4.1,  the  population  first 
order  bias  term,  analogous  to  that  considered  by  Stoker  (1990)  for  kernels,  is 

(6.6)     E[M(z){hj,(v)-h(v)}]  =  E[5(x){gj,(x)-g(x)}] 

=  -E[{5j,(x)-5(x)}{g^(x)-g(x)}]. 

so  that  the  bias  term  for  h  is  equal  to  product  of  biases  terms  for  5  and 

g- 

Under  Assumption  6.1  with  v  =  0,   K  nonrandom,  and  Assumption  6.7  a), 

these  results  will  also  apply  to  uniform  knot  spline  estimators  of  g(x),   if 

5+d 
the  definition  of  Cj(K)   is  changed  to  Cj(K)  =  K'    .   More  generally,  the 

results  apply  to  any  series  estimator  satisfying  Assumptions  3.1-3.8  in  Newey 

(1991),  although  these  do  not  allow  for  Fourier  series  estimators. 


7.    Examples 

This  Section  gives  primitive  regularity  conditions  for  the  validity  of 
the  examples  of  Section  4,  and  one  or  two  additional  examples.   To  save  space, 
this  Section  has  a  special  format,  where  each  subsection  gives  an  estimator 
an  estimator  of  its  asymptotic  covariance  and  a  result  on  v'n-consistency  and 
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asyniptOj.  c  normality  of  the  estimator  and  consistency  of  the  estimated 
variance,  with  discussion  of  the  results  reserved  until  the  end. 

To  give  more  specific  results  on  the  rate  at  which  the  number  of  terms 
can  grow  it  will  be  useful  to  impose  the  following  Assumption. 

T      —      r 
Assumption  7.1:   K(n)  =  n   and  K(n)  =  n   for  some  T  >   7   >   0. 

7. 1   Semiparametric  Random  Effects  for  Binary  Panel  Data 


CP;.o^o)'  =  (!■'', A. An  ^I-'^.A.*  ^(h,(x.)).  S-.    =  1;   X.  =  (x.'  X.;)'. 
1   2      ^1=1  1  1    ^1=1  1      1   1       1         1      il   i2 

h.(x)  =  Ely,  |x],   (t  =  1,  2),   A.  =  I .  (x.  1 -x. '  $"-^  (h„(x.  ) ) )' , 
t         t  1     1   il   i2      2   1 

I.  =  1(0  <  h,(x.)  <  1   and  0  <  h^(x.)  <  1),   . 
1  1   1  2   1 

V  =  ^(I.;^A.Ap-^I.2^G.GMI.;^A.An-^   G.  =  A.  ^c"!  (h^  (x.  ))  -  AM^^.^^^'} 

+  It=l^"^^^~^°'tf^j  =  l'^^*'^^^t^''j^^^"^P^^''j^^"'^'P^^''i^^yti  "  ^t^'^l^^^' 

Theorem  7.1:      Suppose   that      i)     z.      are   i.i.d.;    ii)   Assumptions  6.0  and  6.1 

are  satisfied  for     x  =   (x  ,x   ).      iii)     p(x)  has  continuous  derivatives  of  up 

to  order     d     on     X;      iv)     E[(x  -x  , p(x) )(x  -x  , p(x) )' ]      is  nonsingular;      vi) 

Assumption  7.1    is  satisfied  with     T  <   1/4.   r  >  max{2r^/d,  ZTk/d^,    ^/(d+d_)>. 

o         o 

Then     V?l[(^'  ,S-^)   -   (p'  ,<r^)]   -^  N(0,    V)     and     V  -^  V. 
7. 2    Nonparametric  Consumer  Surplus 

^  =  ri=i>r^g^^r^2i^^^/"' 

^  =  ^i=i^/^'    ^  =^^g^^r^2i^^^i  -^ 

+  <I  "  J^P^(x^,X2.)dx^/n}'rV(x.)[y.  -  i(x.)]. 
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Theorem  1.2:      Suppose   that      i)     z.     are   i.i.d.;    ii)   Assumptions  6.0  -  6.2,    and 
6.5  are  satisfied  for     v  =  0,      s  >  4.      iii)     x  =   (x   ,x^)      is  absolutely 
continuous  with  respect    to   the  product   of   the  uniform  density  on     d     and 

the  distribution  of     x  ,      with  bounded  density     f(x)     and  either  a)  ^  = 

~  1  2  1-11  2     1 

{g.(x.,x^)   +  X  't)},      and  f(x  ,x  )     f(x^)     and     E[x    \x  ]     are  continuously 

different iable   to  order     d_  on  the  support   of     x       or     b)     '[l(f(x)\^)      is 

o 

continuously  different iable  in     X     to  order     d.  on  the  support  of  x.      iv) 

o 

Assumption  7.1    is  satisfied  with     d  >   3n./Z,      T  <    (s-2)/5s,      y  >  n./[2(d+dj] , 

o 

r  >  Tnyd.      Then     Vn(^-p)   -^  N(0.    V)     and     V  -^  V. 


7. 3    Nonparametric  Prediction 

P  =  Zi=iti(v.}  -  i(x.)]/n. 

V  =  Zi=i^i/n,   G.  =  i(v.)-i(x.)-p 

+  <I-"i[p^(v.)-p*^(x.)]/n}'z'^p^(x.)[y.  -i(x.)].   . 
J=l     JJ        '^i-'i^i 

Theorem  7.3:      Suppose   that      i)     z.      are   i.i.d.;    ii)   Assumptions  6.0  -  6.2,    and 
6.5   are  satisfied  for     v  =  0     and     s  >  4,      iii)     v     is  absolutely  continuous 

with  respect    to     x     with  bounded  density     f(v)     and  either  a)  ^  =   (g  (x  )  + 

2  12  1-11  2     1 

X  't)},    V  =   (v  ,x   ),      and     f  \(x   )     f  \(x   )     and     Eix    \x   ]      are  continuously 

different  iable  to  order     d..  on  the  support  of     x       or     b)     Tl(f(x)\'S)     is 

o 

continuously  different iable  in     X     to  order     d„  on  the  support  of     x.      iv) 

o 

Assumption  7.1   is  satisfied  with     d  >   3a/2,  T  <    {s-2)/5s,  y  >  ny[2(.d+dj]. 

o 

r  >  rn/d.      Then     Vn(^-^)   -^  N(0,    V)  and     V  -^  V. 
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7.4      Average  Derivatives 

V  =  Z.^jG.G^/n,   G.  =  m(z.,h(x^))  -  ^ 

+  [5:.",{am(z..h(x.))/5h>{ap^(x.)/ax,}'/n]z"p^(x.)[y.-i(x.)] 
^j=l     1    1  J    1  111 


Theorem  7.4:      Suppose  that      i)  Assumption  5.1    is  satisfied,    and     g(x.)   = 

Ely  .\x  ., y  ._, X  ._,...] ;    ii)  K   =  K(n)    is  not   random;    iii)   Assumptions  6.0  - 

6.2,    6.5,    and  6.1  b)   are  satisfied,    there   is     s'    >  4yi/(^i-2)     such   that 

\a.(z)\    ,    <  00,  (J  =   1,    ...,  J),      X     is  absolutely  continuous  respect    to   the 
J  ^ 

product   of  Lebesgue  measure  on     x       and   the  distribution  of  all   elements 

of     X     other   than     x       with  density     f(x)      that    is  continuously  different iable 

in     X       on   the   interior  of  a  convex  support,      df(x)/dx       zero  on   the  boundary 

-12  -x 

of   the  support,      and     E[\\f(x)      df(x)/dx   W    ]    is  finite.      v)   K  ^  n       for  some     if 

r 

>  0     and     K  =  0(n   )     for  either     a)     m(z,h)      linear   in     h     and     F  < 
(s-2)/[s(7+4v)],      or     b)      T  <   (s-2)/ [s(14+4v)] .      Then     Vn(0-^)   -^  N(0,    V) 
and     V  -^  V. 


7.5       Discussion 

The  conditions  in  Section  7.4  for  multidimensional  average  derivatives 
are  quite  restrictive,  but  could  be  relaxed  if  better  approximation  rate 
results  were  available,  on  approximation  of  derivatives  and  unbounded 
functions  by  power  series.   In  particular,  nonrandom  K  results  from  not 
having  an  approximation  rate  for  6(x),   which  is  unbounded  for  average 
derivatives.   Also,  one  can  relax  these  conditions  substantially  for  weighted 
average  derivatives,  where  fi     =   E[w(x)3g(x)/Sx]   and  the  weight  function  is 
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such  that  f (x)  w(x)5f(x)/5x  +  5w(x)/Sx  is  continuously  differentiable  on 
the  support  of  x,   including  allowing  for  random  K. 

The  estimators  for  the  semiparametric  random  effects  and  for 
nonparametric  consumer  surplus  seem  to  be  new.   The  result  for  nonparametric 
prediction  is  the  first  result  on  v^-consistency  of  an  estimator,  and  includes 
the  unconditional  variance  in  the  estimation  of  the  asymptotic  variance. 
Series  estimators  for  average  derivatives  were  previously  suggested  by  Andrews 
(1991),  although  the  result  here  includes  conditions  for  V^-consistency  and 
apply  to  times  series. 
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Appendix  A:   Proofs  of  Theorems 


Throughout  this  appendix  C  will  denote  a  generic  positive  constant  that 

can  be  different  in  different  uses  and  o   will  denote  o  (1). 

P  P 

Proof  of  Theorem  2.1:   Pathwise  differentiability  of  )i(F  )   follows 

inunediately  from  Theorem  2.1  of  Van  der  Vaart  (1991),  since  asymptotic 

linearity  of  p  and  the  Linbergh-Levy  central  limit  theorem  imply  that  for 

any  S  (z)  e  !f,       (v^(p-p  ) '  ,X]._  S  (z.  )/v^)'   converges  in  distribution  to 
o  U     1  —  1  6   i 

N(0,  E[(0(z)',S„(z))' (i//(z)' ,S^(z))]  ).   Furthermore,  by  the  the  final 
conclusion  of  Lemma  A. 1  of  Van  der  Vaart,  it  follows  that  for  any  vector  b, 
b' (G   [/i(F   )-p(F  -,)])   converges  to  b' E[0(z)S  (z)  ] ,   while  by  pathwise 

ZS       ZU  B 

differentiability  it  follows  that  b'E[0(z)S^(z) ]  =  b'E[d(z)S^(z) ] .   Since 
this  equality  must  hold  for  any  b  and  path,  it  follow  by  Assumption  2. 1 
that  E[ (0(z)-d(z) )s(z) ]  =  0  for  all  mean-zero   s(z),   so  that  choosing  s(z) 
to  be  any  element  of  i/»(z)-d(z),   it  follows  that  0(z)  =  d(z). 

Proof  of  Lemma  5.1:   It  suffices  to  prove  the  result  for  scalar  m.   Let 

let  h.  =  h(v.),  h.  =  h  (v.),  and  m(z,h)  =  m(z,p,h).   By  Assumption  5.2, 

max.   Ilh.-h.ll  <  e  w.p.a.l.   Also,  a  standard  result  implies  max.   {b„„(z.)} 
i^n  11  ^  i^n  02  1 

=  0  (n   02).   Thus,  by  an  expansion, 

(A.l)  liy.",m(z.,h.  )-m.-M(z.  )(h.-h.  )/nll    <  7.",  IIS^m(z. ,  h.  )/5hah' llllh.-h.  Il^/n 

'^1  =  1        111  111  ^1=1  11  11 

J^..(      \\^  n   ,,c     ,     ,,2,  ^    ,    1/&     .      ,    -(l/2)-l/&     ,  .    -1/2- 

s  max.^   {b(z.)}y.    .Ilh.-h.ll   /n  =  0    (n       02)0    (n  02)    =  o    (n  ).      ■ 

i^n  1     ^1=1      1      1  p  p  p 


Proof  of  Lemma  5.2:   Suppress  the  J   subscript.   In  Andrews  (1990b)  notation, 

let     W  _.    =  v.,    W_,    =  z..      Vf     =  V,      q  =  d,      x  =   l(vey)h(v),      k     =  dim(v), 
aiLiiLia  a 

m(w  ,t)  =  t(v),   and  g(w)  =  M(z).   Note  that  Andrews  Assumption  F  ii)  is 
a 

satisfied  by  hypothesis.   Also  by  hypothesis,   t  =  l(v€V)h  (v)   has 
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derivatives  of  up  to  order  q  on  W  ,   and  for   |X|  ^  q,   sup  ^   \D   t(v)  - 

a 

d\q(v)|  =  o  (1).   Let  T   =  {t(v)  :  sup^^^  |d\(v)  |  <  ^^\^y^   IP'^t^Cv)  |  +  1 

a  a 

■\  2    1/2 

for  all  X     with   Ul  ^  q}.   Then  sup  cT^^|xI<  "^W   '^  ""^(^^l  "^^^    <  "•   giving 

a 
Andrews  Assumption  F  iii).   Also,  by  construction,   m(w  ,t)  =  t(v)   is  zero 

(and  hence  constant)  outside  W  ,      giving  Andrews  Assumption  F  iv).   Also,   s' 

cL 

>  2/1/(^-2)   implies  ^l  >   2s'/(s'-2),   so  that  Andrews  Assumptions  v)  and  vi) 
are  satisfied.   Finally,   t(v)  e  T     w.p.a.  1,  and  /.-[tCvj-tCv)  ]  dv  -^  0,   so 
that  the  conclusion  follows  by  Theorem  II. 7  of  Andrews  (1990b). 

Proof  of  Theorem  5.3:   Let  iii  O)  =  y.",m(z. ,  g,  h„(v)  )/n,   mO)  = 

n      ^1=1    1    0 

E[m(z,p.h„(v))],  QO)  =  iii  0)'Wiii  (p),   and  QO)  =  m{/3)'WmO).   Under 
0  n      n 

Assumption  5.6  a).  Assumption  5.2  implies  that  for  each  p,      llm  0)-m  0)ll  -^ 

0,   while  by  z.   ergodic,   iii  O)  -^  mO),   implying  m  O)  -^  m  (p),   so 

that  QO)  -^   QO).   Noting  that   QO)   is  convex  by  iii  O)   convex,  the 

conclusion  then  follows  from  QO)  uniquely  minimized  at  p  ,   as  in  Anderson 

and  Gill  (1982).   Under  Assumption  5.6  b),   sup„  ^llrii  0)-iii  0)ll  -^  0  by 

p€jD  n     n 

Assumption  5.2,   while  sup  .^llm  0)-mO)ll  -^   0  follows  by  Andrews  (1987), 
so  that  sup  _IIQO)-QO)ll  -^   0.   The  conclusion  now  follows  by  the  Wald 
argument  for  extremum  estimators.   ■ 

Proof  of  Theorem  5.4:   By  Lemmas  5.1  and  5.2,  Asymptotic  Linearity  and 

Stochastic  Equicontinuity  are  satisfied,  and  by  orthogonality  of  M(z)  with 

h(z)  and  h(z),   v'iiJT'Kz)  [h(v)-h(v)  ]dF(z)  =  0.   Then  by  the  a-mixing  central 

limit  theorem  of  White  and  Domowitz  (1984),  Vnrii  0-)  =  T.    ,m./V^  +  o  (1)  — > 

n  0    ^1=1  1       p 

N(0,n).   The  remainder  of  the  proof  then  follows  from  a  standard  minimum 
distance  argument,  such  as  that  in  Newey  (1988b).   ■ 

Proof  of  Theorem  5.5:   It  follows  by  Lemma  5.1  that  Asymptotic  Linearity  is 
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satisfied,  so  that  by  Asymptotic  Differentiability,  the  triangle  inequality, 

and  the  a-mixing  central  limit  theorem  of  Vfhite  and  Domowitz  (1984),   vTim  (S_) 

n  0 

=  y.    ,u./v^  +  o  (1)  — )  N(0,n).   The  remainder  of  the  proof  then  follows  by  a 
^1=1  1       p 

standard  minimum  distance  argument,  such  as  that  in  Newey  (1988b).    ■ 


Proof  of  Theorem  5.6:   By  a  mean  value  expansion, 

(A. 2)   X-^Jlin--m.ll^/n  £  ( ll^-^_ll^+sup  llh(v)-h(v)  ll^)j;."  fb,  .  (z.  )^+  b..  (z,  )^]/n 
^1  =  1   11         "^  "^0      V  ^1  =  1   10   1     01   i 

=  0  (€^). 
P  n 

Therefore,      T.    ,llu.-u.ll   /n  =  0    (e    ).      Let     Q„  =  ).    ,u.u'    „/n,      Q  =  Q„      in  case 
^1=1      11  P     n  I       ^1=1    1    i+i  0 

a),      and     Q  =  n     +  E„^.,w(£,  L)  [n„  +  Q']      in  case  b).      Note      llu.u'.^.  -  u.u'^.ll    ^ 

llu.-u.  Illlu.    „-u.    Jl   +   llu.llllu.    „-u.    .11   +   llu.    Jlllu.-u.  II,      so   that  for  all     1^0, 
1      1        i+t     i+£  1        i+l     i+i  i+£       1      1 

by   the  Cauchy-Schwartz  and  Markov   inequalities, 

(A. 3)  lin„  -  nJI    i   {I^"fllG.-u.ll^/n}^''^{j:"~fllG.^,-u.^„ll^/n}^^^ 

t         i  ^1=1      1      1  ^1=1      i+£      i+t 

,,-n-t,,      „2  .    ,l/2,^-£,-  „2  .    ,1/2 

+    <r.    JIu.ll   /n>        {Y.    JIu.    „-u.    „ll   /n} 
^1=1      1  ^1=1      i+£      i+£ 

_^    .^-i,,  ,,2  .    .l/2.Ji-£    '  ,,2,    .1/2 

^^  =  l"''i+£"   '^"^        ^^  =  l"'^i""i"   ^"^ 

i  y."jlu.-u.ll^/n  +   {y.^Jlu.ll^/n}^'^^{y:.^JlG.-u.ll^/n}^^^  =  0    (€    ). 
^1  =  1      1      1  ^1  =  1      1  ^1  =  1      11  p     n 


In  case  a),       lin  -  nil    =    lin„  -  n^ll   =0    (e    )    =  o    (1),      while      lin  -  nil      follows  by 

0    0     p  n     p 

the  law  of  large  numbers,  giving  the  conclusion.   In  case  b),  there  is  a 

sequence  of  numbers  6  ^0  such  that  for  L'  =  6  /e  ,   Prob(L  :£  L'  )  — ^  1, 

n  n  n 

were  L'   can  be  chosen  as  an  integer  by  the  argument  from  Newey  (1990b). 

Then  by  boundedness  of  w(£,L)   and  eq.  (A.  3),   with  probability  approaching 

one  lin  -  nil  s   ||5  -  5  II  +  cr.^jin.-njl  ^    (1+CL')0  (€  )  =  o  (l).   AIso,   l/'/n  = 
0    0     ^£=1   £  £  p  n     p 

0(e  ),   so  by  Davidov's  inequality,  arguing  as  in  Kool  (1988),  it  follows  that 
for  n^  =  n  +  J]fi_.,w(£,  L)  [n.  +  n'],   with  probability  approaching  one,   iin  - 
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a  II  s   \\n  -n   W   +  CX;„^  lin„  -  n.ll  =  O  (L'/Vn)  =  o  (l).   Finally,   applying  the 
dominated  convergence  theorem  as  in  Newey  and  West  (1987),  it  follows  by  L 

-^  00  that  wn^+Y.^^'^u.Din^  +  n^]  -  nil  =  o  (1).  ■ 

Proof  of  Theorem  6.1:   To  show  consistency,  note  first  that  iii)  implies  that 
K^''^Cq(K)/v^  =  o(1)   and  K^'^^<q(K)K~^'^'^  =  o(1),   so  that  Assumption  5.2  is 
satisfied  by  Theorem  6.1  of  Newey  (1991).   Consistency  of  p     then  follow  by 
Theorem  5.3.   Let  objects  without  subscripts  denote  vectors  of  observations. 
Asymptotic  Linearity  follows  by  Lemma  5.1  and  iv),  since  by  Theorem  6.1  of 
Newey  (1991),   llg-gll  /n  =  0  (.K/Vn   +  K     )  =  o  (n   02    ) ,   so  Assumption 
5. 4  is  satisfied. 

Next,  Asymptotic  Differentiability  is  shown,  with  a.(z)   as  derived  in 
Section  4.   For  notational  convenience  the  j   subscript  will  be  dropped  and 
M(z)   treated  as  a  scalar  (for  vector  M(z)   the  result  follows  by  applying 
the  following  argument  to  each  of  its  elements).   Let  Q  =  p(p'p)  p' ,  M  = 
(M(z  ), . . . ,M(z  ))',  and  6  =  QM.   Then  by  Q  idempotent, 

(A.4)     M'(g-g)  -  6'(y-g)  =  6'g  -  M'g  -  5'y  +  5'g 


=  (6-5)' (g-g)  +  S'(y-g)  +  (5-3)'g  =  Rj  +  ^2  ""^  ^3' 


By  Theorem  6.1  of  Newey  (1991)  and  iii), 

iRjI/Vn  s  Vnlia-5lllli-gll   =  0   (.VniK.'^^^/Vn  +  k'^S^'^)  (K^^^/'/n  +  k"^'^)  )   =  o    (1). 

K     K 
By  Lemma  8.1  of  Newey  (1991),  there  are  g„(x)  =  p  (x)'7i   and  5  (x)  = 

K  g         K. 

p^(x)'7r^  such  that   llg(x)-g„(x)  IL  £  CiC^'^     and   Il5(x)-5^,(x)  ll„  ^   CK~^5'^'^. 
O  K     U  Is.  V 

Then  by  Q  idempotent. 
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(A.  5)      IR^I/V^  =  15'  (I-Q)yl/V^  :£  I  (6-6- )'  (y-g)  |/v^  +  I  (6-6^)'Q(y-g)  |/v/H 
*  I (6-6^)' (I-Q)(g-g^)|/Vn  =  R^,  -  R22  -  R23- 


By   I-Q   idempotent  and  K  i  K  with  probability  approaching  one,   R   ^ 
116-6- II  llg-g-ll/Vn  =  0  (Vnk~^^*^6^'''^)  =  o  (1).   By  Lemma  9.8  of  Newey  (1991), 
(y-g)'Q(y-g)/n  =  0  (iC/n).   so  that  R^^  ^   II5-6-II  [  (y-g) 'Q(y-g)/n]  ^'^^  = 

0  (K   K   6   )  =  o  (1).   By  the  strong  mixing  hypotheses,  Davydov' s 

_     2 

inequality,  and  Assumption  6.4,  for  p  =  1-2/ji  and  K  =  [K,K],   R  £ 

1  (6-6-)' (y-g)|^/n  =  Op(i:^E[  |  (6-6^,) '  (y-g)  |^]/n)  =  Op(j:j^E[  |  (6-5^) '  (y-g)  |^]/n)  ^ 

Op(j:j^|(6(x)-6j,(x))(y-g(x))|^)  =  0p(E^|(5(x)-6j,(x))(y-g(x))|^)  =  Op(I^K"^V''). 

Then  since  2d_/a  >  1   follows  from  K   C^^^^K   6    converging  to  zero, 
o  U 

y^K   6   =  0(1)   follows,  implying  R   =  o  (1).   Note  also  that  R   has  the 
J\  ^1     p  3 

same  form  as  R  ,   with  y  and  M   interchanged,  so  that  R  /V^  =0  (1)   also 

follows.   Finally,  note  that   6(x)   is  bounded,  and   |e|   <  cd  for  s  > 

2^/((j-l),   so  that   |a(z)|   <  oa  for   s  >  2/i/((j-l),   so  that  all  of  the 

hypotheses  of  Theorem  5.5  are  satisfied  and  the  first  conclusion  follows  from 

its  conclusion. 

To  prove  the  second  conclusion,  note  first  that  by  Theorem  6. 1  of  Newey 

(1991),   li(x)-g(x)|   =  0  (K^''^Cn(K)[K^''^/v^  +  K'"^^""])  =  0  (€  ).   Therefore, 

by  Theorem  5.6,  it  only  remains  to  be  shown  that  V.    Jla.-a(z.)ll  /n  =  0  (€  ). 

^1  =  1   11         P  n 

It  suffices  to  show  this  result  for  each  element  of  a,   and  hence  a  can  be 

assumed  to  be  scalar  without  loss  of  generality.   Let  c.  =  y.-g(x.),  e.    = 

111  1 

y.-g(x.),    M.    =  5m(z. ,p,h(x))/ah,    M  =    (M M    ).    Then 

11  1  1  In 

(A. 6)      Cr."jla.-a(z.  )ll^/n  £  T."  JIM'p^'p^(c. -c.  )  ll^/n  +  7."  ,11  (M-M) 'p'^'p'^e.  Il^/n 
^1=1      1  1  ^1  =  1  111  ^1  =  1  "^        '^i    1 

+  y'."jl(M'p^~p^-6.  )c.ll^/n  =  R,    +  R^   +  R^. 
^1=1  111  123 
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By  6   s  2  and  Q   idempotent,   M'QM/n  ^  M'M/n  =0  (1).   Therefore,  by 
Theorem  6.1  of  Newey  (1991), 

(A. 7)     R,  £  sup  li(x)-g(x)|^M'QM/n  =  0  (e^).  . 
1      X  P  ri 

Also,  by  &   £  1,  6   £  1   and  a  Taylor  expansion,   (M-M)'Q(M-M)/n  £ 

IIM-MII^/n  =  0    (€^n~^^^),      so 
P     n 

(A.8)      R^   £    (max.^  e^)y;."jl  (M-M)'p*^"p^ll^/n  =  0    (n^''^)  (M-M)'Q(M-M)/n  =  0    (e^). 
2  lin  1  ^1=1  1  p  P     n 

Finally,  note  that  M'p  z!  p.   is  6(x. )  where  5(x. )   is  the  series  estimator 

11  1 

of  5(x)      from  regressing     M     on     p   .      Thus,    by  Theorem  6.1   of  Newey    (1991), 

(A.  9)  R^  £    (max.      G?)y'.'^JI6(x.  )-5(x.  )  ll^/n  =  0    (e^).  ■ 

3  lin   1  ^1=1  11  P     n 


Proof  of  Theorem  6.2:   First,  consider  the  case  where  Assumption  6.7  a)  is 
satisfied.   Consistency  of  ^  follows  as  in  the  proof  of  Theorem  6.1,  noting 
that  a  =  d/n.      Next,  it  follows  by  Theorem  6.1  of  Newey  (1991)  and 
Assumption  6.6  that  Vlillh(v)-h  (v)!!^  £  Ci/nllg(x)-g(x)  11^  = 
0  (V^C.  (K)  [K/n+K   ])  =  o  ,   so  that  Asymptotic  Linearity  follows  by  the 
same  Taylor  expansion  argument  as  used  in  the  proof  of  Lemma  5.1.   To  show 
Asymptotic  Differentiability,  let 

*j,  =  J-5(x)p^(x)dF(x),   Z^  =  Xp^(x)p*^(x)'dF(x),   71^  =  2:^^E[p^(x)g(x)]. 

6j,(x)  =  p*^(x)'Z^^*j,,   g^(x)  =  p^(x)'7rj,,   hj,(v)  =  A(v,gj,)  =  A(v)'7rj,, 


*  =  Ij"iM(z^)A(v.)'/n. 


Note  that  h(v)   is  invariant  to  nonsingular  linear  transformations,  so  that 
without  loss  of  generality  p  (x),  p^(x),  ...   can  be  assumed  to  be  the 
functions  in  the  conclusion  of  Lemma  8.4  of  Newey  (1991),  for  which  the 
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smallest  eigenvalue  of  Z   is  bounded  below  and  there  is  a  constant   C  such 
that   sup  ^    ID  p  (x)|  s  CC,        [K] .      Then,  by  Assumption  6.6 

(A. 10)    maXj^^j^llA(v,Pj^)llQ  i  C^(K). 


It  then  follows  by   |M(z)  |    finite  for  s  >  2/i/(/i-l)   and  Lenuna  9.6  of  Newey 
(1991)  that 

—  1  /9  —  /N  —  —     ? 

(A.  11)     ll*-*-ll  =  0  (K   C.(K)/yn)  =  o  .   IIZ-ZII  =  0  {KC,AK)    /Vn)    =  o   . 
K     p      A  p  P   0  p 


Also,  as  in  the  proof  Lemma  9.8  of  Newey  (1991), 

(A.  12)     liz'-^'^^p' (y-g)/v/nll  =0  (K^'^^/'/K). 

~        K    ~ 
Next,  by  Lemma  8.1  of  Newey  (1991)  there  exists  g„(x)  =  p  (x)'7r  such 

K. 

that      llg(x)-g^(x)ll.    :£  CK~".      Note   that      Uti-tiII    :£  C[  (7i-7i)'Z(7r-7i)  ]  ^'^^  = 
C|gj^(x)-ij^(x)l2  ^  C[|g(x)-gj,(x)l2+lg(x)-ij,(x)l2]    ^  C|g(x)-g^(x)  1^  ^  CK"". 
Therefore. 

(A.  13)      llg(x)-gj,(x)ll^  £   llg(x)-gj,(x)ll^  +    llgj,(x)-gj,(x)ll^  s  C(K~"+llp^(x)ll^ll7i-Trll) 

:£   C(K""+llp^(x)IIJIn-7ill)    =£   CK^''^<.(K)K~", 
A  A 


By  the  definition  of  the  least  squares  coefficients  7i  and  Lemma  8. 1  of 
Newey  (1991)  it  follows  that  E[p^(x) {g(x)-g^(x) }]  =  0  and   |g(x)-gj,(x) 1^  ^ 
CK  0,   so  that  under  uniform  mixing,  for  p  =  [p  (x  ),..., p  (x  )]', 

(A.  14)   E[Ij^llp^'(g-g^)/v/nll^]  £  CEj^E[llp^(x)ll^(g(x)-g^(x))2]  ^  CJ:^KCq(K)V^%, 


which  converges  to  zero  by  i).   Without  uniform  mixing,  it  follows  by  strong 
mixing  and  boundedness  of  p  (x),  g(x),   and  g„(x)   that 
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which  converges  to  zero  by  ii)  b).   Then,  since  lip' (g-g')/\^ll  ^ 

K  2 

Zi/"P  '  (g-gv,)/v'nll   with  probability  approaching  one,  it  follows  by  the  Markov 

2 

inequality  that  lip'  (g-g/>)/Vnll  =  o  . 

K.         p 

Now,  it  follows  by  a  little  arithmetic  that 
(A.  16)   5:.|^,M(z.)[h(v.)-h(v.)]/n  =  ($-*)'Z"-^p' (g-g/>)/n  +  *' (z"-^-z"-^  )p' (g-g/>)/n 

1  —  1  111  I\.  K 

+  *j^'s"^p'(g-gj^)/n  +    ($-*-)'Z"-^p'(y-g)/n  +  *' (i"-^-z"^  )p' (y-g)/n 

+    (6^-6)' (y-g)/n  +  J^.^^ (M(z J [h^(v. )-h(v. ) ]-XM(z) [h^(v)-h(v)]dF(z) )/n 

+  J>l(z)[h^(v)-h(v)]dF(z)   +  6'(y-g)/n  =  I,?,R,  +  5' (y-g)/n. 

By      IIZ-ZII   =  o        and      |A    .    {Z)-A    .    (Z)  |    :£   IIZ-ZII,      the   largest   eigenvalue  of 
p  min  min 

Z"'^      is  bounded   in  probability,    so   that      II  ($-*-) 'z"'^ I!   :£   ll$-*^IIO   (1)   =  o    (1). 

'^  -^  K  K     p  p 

-1  -1         1/2  2  -1 

Also,      ll*-'Z      II   :£  C[*-'Z     */>]  ^  CE[5(x)    ],      so   that      ll*/>'Z     II   =  0   (1),      and 

K.  K.  K.  K  p 

hence      II*-' (Z~-^-Z~-^ )  II   ^   ll%'Z~-^  llll  (Z-Z)Z~-^II   =  o   .      It  now  follows  by 

K.  I^  p 

v^llp' (g-g-)/nll  =  o   that  v^ .  =  o  ,   (J  =  1,  2,  3).   Also,  by 
^  P  J    P 

IIZ~'^''^p' (y-g)/v'nil  =  0  (K),   it  follows  similarly  from  eq.  (A.  11)  and  iii)  that 

V^.  =  o   and  V^_  =  o  . 
4    p         5    p 

Next,  note  that  by  either  uniform  mixing  or  the  bound  on  the  conditional 

covariances,   E[{  (5j,-5)' (y-g)}^/n]  £  CE[{l+e^}{5j^(x)-5(x)  >^]  £ 

CE[{1+E[e^|x]}{5,,(x)-S(x)}^]  s   C€-(K)^,   so  that  by  Assumption  6.4  and  iii), 
K.  o 

(A.  17)    nlR^I^  =  Op(j:j^E[{(5^-5)'(y-g)}^/n])  =  O^ilj^e^iK)^)   =  o^. 

Similarly,  note  that  by  eq.  (A. 13),   |h^(v)-h(v) |^  £  CK^^^C^{K)K~",   so  that 
by  strong  mixing,  Davydov' s  inequality,  and  i)  for  p  =  1  -  2//J, 
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(A.  18)  n|R_|^   =   0    {l^\mz)lh^{v)-h{v)]\l)    =  0    (5:^KC.(K)^K  ^'^)    =   o    . 

I  p      K  K  p  p      K      A  p 

Furthermore,    by  Assumption  4.1,      *     =  E[5(x)p    (x)],      so   that   by 
j6{x)g-(x)dF(x)    =  J-6^(x)g^(x)dF(x)    =  j5^(x)g(x)dF(x) , 

(A.19)  v^lRgl    =  ••n|j6(x)g^(x)dF(x)    -  E[5(x)g(x) ] | 

=  ■•K|J[6(x)-5£,(x)][g/>(x)-g(x)]dF(x)|    :£  Vne-(K)K"%  :£  Vne-(K)K"%  -^  0, 

K  K.  O  o 


where  the  last  inequality  follows  by  e-(K)   monotonically  decreasing  in  K. 

o 

For  the  case  where  Assumption  6.7  b)  is  satisfied,  it  follows  by  Lemma 
8.2  of  Newey  (1991)  that  all  the  previous  arguments  hold  with  a  and  a 
replaced  by  any  (arbitrarily  large)  positive  number  a.   It  then  follows  K 
bounded  by  a  power  of  n,   Cj^I^^   bounded  by  a  power  of  K,   and  Assumption 
6.4  that  all  terms  above  where  K  o,   K   ,   K  o,   or  K    appear  are  small, 
so  that  all  the  terms  depending  on  a   or  a  in  the  statement  of  the  Theorem 
can  be  ignored,  i.e.   a   and  a  can  be  set  to  +oo,  again  giving  the  first 
conclusion. 

The  second  conclusion  will  be  shown  only  under  case  a)  of  Assumption  6.7, 
because  under  case  b)  the  result  will  follow  as  above.   For  notational 
convenience,  suppress  the  j   subscript  on  each  h.(v).   It  follows  from 
Theorem  6.1  of  Newey  (1991)  that   |h(v)-h(v)|   =  0  (K^^^C^ (K) [K^'^^/V^+K~"] ) 

CD       p  A  ~ 

=  0  (e  ).   Therefore,  by  Theorem  5.6,  it  only  remains  to  be  shown  that 
p  n 

r.^Jla.-a(z.  )ll^/n  =  0  (e^).   Let  *  =  y.^.m^  (z.  ,  p,  h(v.  )  )A(v.  ) '/n  and  define 
^1=1   11         P  n  ^1=1  hi      1     1 

€^  =  K^'^^<^(K)[K^'^^/\/n  +  K~"]   if  m  is  linear  in  h  and  e^  = 
KC^(K)^tK-^^^/\/n  +  K~"]   otherwise.   By  eq.  (A.  10)   sup^^^llA(v)ll  £  CK^'^^C^(K). 
so  that  by  an  expansion. 
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(A. 20)     ll$-*/>ll  :£  ll$-*i>ll  +  |IIA(v)lll  {lip-P-liy.^.llm,  -(2.,p,h(v.))ll/n 
+  |h(v)-hQ(v)lJ.;^llm^(z.,p,h(v.))ll/n}  =0^[^^). 

Also,  it  follows  as  in  the  proof  of  eq.  (A.  11)  that   ll*-'S~^  (Z-i)i~'^ll  =  0  (e„) 

for  €„  =  KC,AZ)^/Vn     and   ll*-'Z~-^ll   is  bounded,  so  that   ll^'Z"'^-*-'^"^!!  £ 

II  (*-*/> )'2~-^ II  +  ll*i>'S~-^(Z-Z)i~'^ll  =  0  (€,+€„)  =  o  ,   and  ll^-'i'^^ll  =  0  (1). 
K  K  p  v  Z     p  p 

Next,    note  that 

(A.  21)   l.'^.\a.-(x.\^/n  ^  Cj;.!?  J*'Z"^P.  (e. -c.  )  |^/n  +05^.^,1  ($'z"^-*/>'z"^  )P.e.  |^/n 
1  —  1       1       1  1  —  1  111  1  —  1  K.  11 

+  CX.^j|[5j,(x.)-5(x.)]c.  l^/n  =  R^   +  ^2  ■"  ^3" 

Therefore,  by  Theorem  6.1  of  Newey(1991),  and  d/n,   i  a, 
(A.22)     |R,|  £  Cll$'z"^ll|llp^(x)ll|^.'^ji(x.)-g(x.)|^/n 

i  00^1  — i        1         1 

=  0  (Kr  (K)^[(K/n)  +  k"^^^""])  =  0  (e^). 
p   0  -  P  n 

Also.  I.^liz"^/2p.l|2/n  =  Z."  tr(Z"^''2p_p,£-l/2^/^  ^  tr (z"^''^(P'P/n)r^/2)/n 
^1=1       1       ^1=1        1  1 

is  equal  to  the  dimension  of  Z,   less  than  or  equal  to  K  w.p.a. 1.,  so 

(A.  23)    IR^I  £  C(max.,  c^)  (T.^JIZ'-^^^P.  Il^/n)  [  ll($-vt) 'Z"-^^^ll^ 
Z        isn  1   1=1       1 

+  II*'Z"^/2||2|!Z"^/2(Z-Z)Z"^''2||2]  =  0  (n^^=K(e  +6,)^). 

p        Z  w 

2 
Finally,  it  follows  as  in  the  proof  of  Theorem  6.1,  using  E[e  |x]  bounded, 

that  |R_|  =  0^(J:„  ^E[<6„(x)-5(x)}^])  =  0  (e^).    ■ 

Proof  of  Theorem  7. 1:  The  proof  proceeds  by  verifying  the  hypotheses  of 
Theorem  6.1.  Assumption  5.1  holds  by  i).  The  estimator  has  the  form  of 
Section  6,  where  m(z,3,cr  h)  = 
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I(0<h^<l,0<h2<l)(x^-x^,$  ^ih^))'[i   ^(h^)-cr^i   '^  {h^)-ix^-x^)' fi] .      Note  that 
X  +  p(x)   is  bounded,  so  that  h  (x)  =  $(  [x  +p(x)  ]/<r   )   is  bounded  away 
from  zero  and  one,  and  that   $   (•)   is  continuously  differentiable  on  any  set 
where  its  argument  is  bounded  away  from  zero  and  one.   It  follows  that 
Assumptions  5.3  and  5.6  are  satisfied  for  case  a)  of  Assumption  5.6  and  ^ -u.  - 
CD,   1  <  j+k  £  2.   Assumptions  6.1  -  6.5  also  hold,  with  v  =  0  and  s  =  oo. 
Also,  note  that  M  (z)  is  a  bounded  function  of  x  and  &   is  the  set  of  all 

mean-square  integrable  functions  of  x,   so  that  M.(z)  =  5.(x),   and  i)  -  ii) 

1/2 
of  Theorem  6. 1  are  satisfied  with  d_  =  d.      Noting  that   Co(K)  =  K   ,   it 

o  u 

follows  by  vi)  that  Theorem  6.1  iii)  and  iv)  are  satisfied,  since  each  of 

4r-i   r-r{d/2k)        ,      .s-^d/k  '  _,  ..  .    ,  . 

n    ,   n         ,  and  n         converge  to  zero.  The  first  conclusion  now 

follows  by  Theorem  6.1.   Next,  note  s  =  m  by  y.-h  (x)  bounded,  so  that 

Theorem  6.1  v)  is  implied  by  Theorem  6.1  iii),  so  €  — >  0  and  the  second 
conclusion  also  follows  by  Theorem  6.1.     ■ 


Proof  of  Theorem  7.2:   Follows  similarly  to  the  proof  of  Theorem  7.3  to 
follow,  on  noting  that   1)  Assumption  4.1  is  satisfied,  where  E[A(v,g)]  = 
E[J"^g(x  ,x^)dx  ]  =  £{d)E[g{x)f{x)] ,      1      is  the  Lebesgue  measure,  and  hence 
6(x)  =  ie(^)n(f(x)|g);   2)   :e(^)E[f(x)|Xj]  =  f(x^,X2)~^f(x2)   in  case  b),  where 
the  projection  has  an  explicit  form.    ■ 


Proof  of  Theorem  7.3:   The  proof  proceeds  by  verifying  the  hypotheses  of 
Theorem  6.2.   Assumption  5.1  holds  by  i).   The  estimator  has  the  form  of 
Section  6,  where  m(z,p,h)  =  g(v)-g(x)-^,   so  that  Assumptions  5.3  and  5.6  are 
satisfied  for  case  a)  of  Assumption  5.6  and  6 .   =  oo,   1  i  j+k  :£  2.   Also,  A  = 
0,   so  that  Assumption  6.7  a)  is  satisfied.   Let  h  (v  )  =  g(v),   h  (v  )  = 
g(x),   so  that   5  (x)  =  1.   To  discuss  6  (x),   note  first  that  by  Lemma  8.0 
of  Newey  (1991),  '§      is  closed,  so  that  IT(f(x)|§')   exists.   As  shown  in 
Section  4,  Assumption  4.1  is  satisfied  for  5  (x)  =  II(f(x)|§'),   so  that  under 
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iii)  b),  it  follows  by  Lemma  8.2  of  Newey  (1991)  that  €^(K)  s  k'^S'^^.   Under 

o 

iii)  a),  the  projection  has  an  explicit  form  n(M(z)|^)  =  E[M(z)|x  ]  + 
{x^-E[x^|x^]}(E[Var(x^|x^)])"-^E[{x^-E[x^|x^]}M(z)],   and  that  E[f(x)|x^]  = 

f  i(x^)~'^f  i(x-^).  so  that  part  b)  is  satisfied.   Noting  that  C„(K)  =  CAY.)   = 

1/2 
K     and  a  =  a  =  -d/a,      it  follows  by  iv)  that  the  hypotheses  of  Theorem 

.,.,,,       ,   .   iri3-2d/n)  1/2-y  (d+d-)/a    5r-l+(l/s) 

6.2  are  satisfied,  since  each  of  n        ,   n        5   ,   n         , 

and  n         converge  to  zero.   The  conclusions  now  follow  from  the 

conclusion  of  Theorem  6.2.    ■ 


Proof  of  Theorem  7.4:   First  the  result  will  be  proven  when  x   is  a  scalar. 

Assumption  5.1  holds  by  i ) .   The  estimator  has  the  form  of  Section  6,  where 

m(z,p,h)  =  m(z,9g(x)/Sx  )  -  p,   so  that  Assumptions  5.3  and  5.6  are  satisfied 

for  case  a)  of  Assumption  5.6  and  & .   =  oo,  Is   j+k  £  2.      As  shown  in  Section 

4,   5(x)  =  n(f(x)   9f(x)/9x  |&),   so  that  by  the  usual  mean-square  spanning 

result  for  polynomials,   e_(K)  — )  0     as  K  — >  oa.      Also,  since  Assumption  6.7 

o 

b)  is  satisfied,  none  of  the  conditions  of  Theorems  6.2  that  depend  on  a  or 

a   are  binding.   The  conclusion  now  follows  by  Theorem  6.2,  since  A  = 

^     u  r   u^   •   1-      •    u    ...  V,  ^v,    (l/s)+r[(7/2)+y]-(l/2) 

1  and  when  m(z,h)   is  linear  in  h  and  both  n  and 

(l/s)+r[(5/2)+2i^]-(l/2)     ^         u-n    *u    •     (l/s)+r(2+l+2i'+4)-(l/2) 
n  go  to  zero,  while  otherwise  n 

converges  to  zero,  so  the  conclusion  follows  by  Theorem  6,2.   ■ 
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